vk-url-scraper#

class vk_url_scraper.DateTimeEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]#

Bases: JSONEncoder

default(o)[source]#

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
class vk_url_scraper.VkScraper(username, password, token=None, session_file='vk_config.v2.json', captcha_handler=<function captcha_handler>)[source]#

Bases: object

VkScraper class that allows to authenticate and scrape URLs.

All scrape* functions return a payload like:

{
    "id": "wall_id",
    "text": "text in this post" ,
    "datetime": datetime of post,
    "attachments": {
        # only present values will appear, can be empty dict
        "photo": [list of urls with max quality],
        "video": [list of urls with max quality],
        "link": [list of urls with max quality],
    },
    "payload": {"more": "original JSON response as dict which you can parse for more data"}
}
download_media(results, destination='./output/')[source]#

Receives a list of dicts as returned by any of the scrape* methods and downloads the URLS present if they are of type photo or video into the destination folder

Parameters:
  • results (List[dict]) – list with valid dictionary results (see class definition)

  • destination (str) – the directory to save the downloaded files to. defaults to output/

Return type:

a list of filenames for the downloaded files

scrape(url)[source]#

Scrapes a URL for multiple possibilities of inner links such as wall, video, photo, …

Parameters:

url (str) – The URL to parse and analyze content from, typically shared from vk.com feature or copy-pasted from the browser

Return type:

a list of dict as specified in the class documentation.

scrape_photo_ids(photo_ids)[source]#

Receives a list of photo ids like photo123123_1231 see api docs

Parameters:

photo_ids (List[str]) – list with valid photo ids like “photo123123_1231”

Return type:

a list of dict as specified in the class documentation.

scrape_photos(url)[source]#

Scrapes a URL for multiple photo data

Parameters:

url (str) – The URL to parse - should contain something like “…photo1212_3434…”

Return type:

a list of dict as specified in the class documentation.

scrape_video_ids(video_ids)[source]#

Receives a list of video ids like video123123_1231 see api docs

Parameters:

video_ids (List[str]) – list with valid video ids like “video123123_1231”

Return type:

a list of dict as specified in the class documentation.

scrape_videos(url)[source]#

Scrapes a URL for multiple video data

Parameters:

url (str) – The URL to parse - should contain something like “…video1212_3434…”

Return type:

a list of dict as specified in the class documentation.

scrape_wall_ids(wall_ids, copy_history_depth=2)[source]#

Receives a list of wall ids like wall123123_1231 see api docs

Parameters:
  • wall_ids (List[str]) – list with valid wall ids like “wall123123_1231”

  • copy_history_depth (int) – see api docs

Return type:

a list of dict as specified in the class documentation.

scrape_walls(url)[source]#

Scrapes a URL for multiple wall data

Parameters:

url (str) – The URL to parse - should contain something like “…wall1212_3434…”

Return type:

a list of dict as specified in the class documentation.

PHOTO_PATTERN = re.compile('(photo.{0,1}\\d+_\\d+)')#
VIDEO_PATTERN = re.compile('(video.{0,1}\\d+_\\d+(?:_\\w+)?)')#
WALL_PATTERN = re.compile('(wall.{0,1}\\d+_\\d+)')#
vk_url_scraper.suppress_stdout()[source]#

Contents#

Team#

vk-url-scraper is developed and maintained by the Bellingcat Tech Team. To learn more about who specifically contributed to this codebase, see our contributors page.

License#

vk-url-scraper is licensed under the MIT license. A full copy of the license can be found on GitHub.

Indices and tables#