vk-url-scraper#
- class vk_url_scraper.DateTimeEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]#
Bases:
JSONEncoder- default(o)[source]#
Implement this method in a subclass such that it returns a serializable object for
o, or calls the base implementation (to raise aTypeError).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
- class vk_url_scraper.VkScraper(username, password, token=None, session_file='vk_config.v2.json', captcha_handler=<function captcha_handler>)[source]#
Bases:
objectVkScraper class that allows to authenticate and scrape URLs.
All scrape* functions return a payload like:
{ "id": "wall_id", "text": "text in this post" , "datetime": datetime of post, "attachments": { # only present values will appear, can be empty dict "photo": [list of urls with max quality], "video": [list of urls with max quality], "link": [list of urls with max quality], }, "payload": {"more": "original JSON response as dict which you can parse for more data"} }
- download_media(results, destination='./output/')[source]#
Receives a list of dicts as returned by any of the scrape* methods and downloads the URLS present if they are of type photo or video into the destination folder
- scrape(url)[source]#
Scrapes a URL for multiple possibilities of inner links such as wall, video, photo, …
- Parameters:
url (str) – The URL to parse and analyze content from, typically shared from vk.com feature or copy-pasted from the browser
- Return type:
a list of dict as specified in the class documentation.
- scrape_photo_ids(photo_ids)[source]#
Receives a list of photo ids like photo123123_1231 see api docs
- Parameters:
photo_ids (List[str]) – list with valid photo ids like “photo123123_1231”
- Return type:
a list of dict as specified in the class documentation.
- scrape_photos(url)[source]#
Scrapes a URL for multiple photo data
- Parameters:
url (str) – The URL to parse - should contain something like “…photo1212_3434…”
- Return type:
a list of dict as specified in the class documentation.
- scrape_video_ids(video_ids)[source]#
Receives a list of video ids like video123123_1231 see api docs
- Parameters:
video_ids (List[str]) – list with valid video ids like “video123123_1231”
- Return type:
a list of dict as specified in the class documentation.
- scrape_videos(url)[source]#
Scrapes a URL for multiple video data
- Parameters:
url (str) – The URL to parse - should contain something like “…video1212_3434…”
- Return type:
a list of dict as specified in the class documentation.
- scrape_wall_ids(wall_ids, copy_history_depth=2)[source]#
Receives a list of wall ids like wall123123_1231 see api docs
- scrape_walls(url)[source]#
Scrapes a URL for multiple wall data
- Parameters:
url (str) – The URL to parse - should contain something like “…wall1212_3434…”
- Return type:
a list of dict as specified in the class documentation.
- PHOTO_PATTERN = re.compile('(photo.{0,1}\\d+_\\d+)')#
- VIDEO_PATTERN = re.compile('(video.{0,1}\\d+_\\d+(?:_\\w+)?)')#
- WALL_PATTERN = re.compile('(wall.{0,1}\\d+_\\d+)')#
Contents#
Getting started:
Team#
vk-url-scraper is developed and maintained by the Bellingcat Tech Team. To learn more about who specifically contributed to this codebase, see our contributors page.
License#
vk-url-scraper is licensed under the MIT license. A full copy of the license can be found on GitHub.