Skip to main content

Asynchronous scraper to download youtube community posts

Project description

YoutubeCommunityScraper | yp-dl

yp-dl is an asynchronous scraper for downloading Youtube Community posts in json format.

Motivation

Youtube stops retrieving old community posts after 200 posts on a channel. There's no way to access/view older posts if you do not have the link to them or their ID.

Installation

pip install yp-dl

⚠️ Notice: For people on version 0.9.11 and before, you won't be able to use pip install upgrade yp-dl since I've upgraded the lxml version dependency above what is specified (refused to build on the old version suddenly for some reason). Please uninstall and then install yp-dl again to get the latest version.

Features

  • Asynchronous support
  • For every post it retrieves:
    • post_link
    • time_since
    • utc_timestamp at download
    • video_link
    • image_links
    • text_content
    • poll_content
  • Update support for the json files when new posts are made
  • Progress visualization during download

Usage

usage: yp-dl [-h] [-f FOLDER_PATH] [-r] [-u] [-v] [-o] [-d] link [link ...]

An asynchronous scraper that downloads youtube posts from youtube channels in json format.

positional arguments:
  link                  Provide any number of links. 
                        Link example: https://www.youtube.com/@3blue1brown

options:
  -h, --help            show this help message and exit
  -f FOLDER_PATH, --folder-path FOLDER_PATH
                        Provide the path of the folder you wish to store/update your json files. 
                        If it's in the current working directory (CWD), just type the folder 
                        name. If none is provided, everything will be stored/updated in the CWD.
  -r, --reverse         Reverses the order of the posts from oldest first to newest first. 
                        Be wary though, if you use this option with --update, your post order 
                        will be messed up.
  -u, --update          Appends the existing json file(s) with the new posts.
  -v, --verbose         Gives more details about what's going on when the program runs.
  -o, --overwrite-cookie
                        Overwrites the SOCS cookie in the cookies.txt file with a Default SOCS 
                        cookie within the project. Use if having problems retrieving posts.
  -d, --delete-cookie   Removes the cookie file to generate it again. Use if your SOCS key 
                        has expired (lifetime is 2 years).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yp_dl-0.9.16.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yp_dl-0.9.16-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file yp_dl-0.9.16.tar.gz.

File metadata

  • Download URL: yp_dl-0.9.16.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for yp_dl-0.9.16.tar.gz
Algorithm Hash digest
SHA256 d70ad2f1b9057b2e0c11368b3c3069ccd668698c5087b75ef494cf6b6bd1e0e1
MD5 5bf99b5bba813ab6b41cde249fe310fd
BLAKE2b-256 6be6755fab42438a53bc1aa66520b2fde34ba52ef1e94d3939306b6b7dd66ea7

See more details on using hashes here.

Provenance

The following attestation bundles were made for yp_dl-0.9.16.tar.gz:

Publisher: python-publish.yml on NothingNaN/YoutubeCommunityScraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yp_dl-0.9.16-py3-none-any.whl.

File metadata

  • Download URL: yp_dl-0.9.16-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for yp_dl-0.9.16-py3-none-any.whl
Algorithm Hash digest
SHA256 681c21d29a52eeb73f5ec2d596e0b634c5fa33fe15b980dd500c532ba18a1eec
MD5 9d926f5bb48becfb818d491071318f82
BLAKE2b-256 39e9b79d2ca120d7a8b308859126fdf6cffb0283dba9acc9be6355214d66c4c3

See more details on using hashes here.

Provenance

The following attestation bundles were made for yp_dl-0.9.16-py3-none-any.whl:

Publisher: python-publish.yml on NothingNaN/YoutubeCommunityScraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page