Skip to main content

Video Crawler

Project description

vidcrawler

Crawls major videos sites like YouTube/Rumble/Bitchute/Brighteon for video content and outputs a json feed of all the videos that were found.

Platform Unit Tests

Actions Status Actions Status Actions Status

API

Command line

vidcrawler --input_crawl_json "fetch_list.json" --output_json "out_list.json"

Python

from vidcrawler import crawl_video_sites

crawl_list = [
    ["channel name", "source", "channel_id"]
]
output = crawl_video_sites(crawl_list)

"source" and "channel_id" are used to generate the video-platform-specific urls to fetch data. The "channel name" is echo'd back in the generated json feeds, but doesn't not affect the fetching process in any way.

Testing

Install vidcrawler and then the command vidcralwer_test will become available.

$ pip install vidcrawler
$ vidcrawler_test

Example input fetch_list.json

[
    [
        "Health Ranger Report",
        "brighteon",
        "hrreport"
    ],
    [
        "Sydney Watson",
        "youtube",
        "UCSFy-1JrpZf0tFlRZfo-Rvw"
    ],
    [
        "Computing Forever",
        "bitchute",
        "hybm74uihjkf"
    ],
    [
        "ThePeteSantilliShow",
        "rumble",
        "ThePeteSantilliShow"
    ],
    [
        "Macroaggressions",
        "odysee",
        "Macroaggressions"
    ]
]

Example Output:

[
  {
    "channel_name": "ThePeteSantilliShow",
    "title": "The damage this caused is now being totaled up",
    "date_published": "2022-05-17T05:00:11+00:00",
    "date_lastupdated": "2022-05-17T05:17:18.540084",
    "channel_url": "https://www.youtube.com/channel/UCXIJgqnII2ZOINSWNOGFThA",
    "source": "youtube.com",
    "url": "https://www.youtube.com/watch?v=bwqBudCzDrQ",
    "duration": 254,
    "description": "",
    "img_src": "https://i3.ytimg.com/vi/bwqBudCzDrQ/hqdefault.jpg",
    "iframe_src": "https://youtube.com/embed/bwqBudCzDrQ",
    "views": 1429
  },
  {
      ...
  }
]

Releases

  • 1.0.4: Fixes local_now() to be local timezone aware.
  • 1.0.3: Bump
  • 1.0.2: Updates testing
  • 1.0.1: improves command line
  • 1.0.0: Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vidcrawler-1.0.6.tar.gz (27.5 kB view details)

Uploaded Source

Built Distribution

vidcrawler-1.0.6-py2.py3-none-any.whl (37.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file vidcrawler-1.0.6.tar.gz.

File metadata

  • Download URL: vidcrawler-1.0.6.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.9.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.0

File hashes

Hashes for vidcrawler-1.0.6.tar.gz
Algorithm Hash digest
SHA256 5b28da498182751daa39325c9ba2622f4f7a27a4dc3e5a8dbc3464ac60af3810
MD5 997c2a986cd3926d4790dfcade0c864d
BLAKE2b-256 6c6b77343ce1289cd09597134701724a327fcb42ee8c195fbc7cb83a375b8b1a

See more details on using hashes here.

File details

Details for the file vidcrawler-1.0.6-py2.py3-none-any.whl.

File metadata

  • Download URL: vidcrawler-1.0.6-py2.py3-none-any.whl
  • Upload date:
  • Size: 37.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.9.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.0

File hashes

Hashes for vidcrawler-1.0.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f0ed08bb057215f16855beb3079667f99c19851e92a20b7d5ffbf2575bd75537
MD5 16ce8806e35b97772925b2ef5613c79f
BLAKE2b-256 22b2fb44c27a04ebcfbd7e04bb13859d438f7ec2cf5b835aeaa7951b50aa6275

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page