Skip to main content

Video Crawler

Project description

vidcrawler

Crawls major videos sites like YouTube/Rumble/Bitchute/Brighteon for video content and outputs a json feed of all the videos that were found.

Platform Unit Tests

Actions Status Actions Status Actions Status

API

Command line

vidcrawler --input_crawl_json "fetch_list.json" --output_json "out_list.json"

Python

import json
from vidcrawler import crawl_video_sites

crawl_list = [
    ["channel name", "source", "channel_id"]
]
output = crawl_video_sites(crawl_list)
print(json.dumps(output))

"source" and "channel_id" are used to generate the video-platform-specific urls to fetch data. The "channel name" is echo'd back in the generated json feeds, but doesn't not affect the fetching process in any way.

Testing

Install vidcrawler and then the command vidcralwer_test will become available.

$ pip install vidcrawler
$ vidcrawler_test

Example input fetch_list.json

[
    [
        "Health Ranger Report",
        "brighteon",
        "hrreport"
    ],
    [
        "Sydney Watson",
        "youtube",
        "UCSFy-1JrpZf0tFlRZfo-Rvw"
    ],
    [
        "Computing Forever",
        "bitchute",
        "hybm74uihjkf"
    ],
    [
        "ThePeteSantilliShow",
        "rumble",
        "ThePeteSantilliShow"
    ],
    [
        "Macroaggressions",
        "odysee",
        "Macroaggressions"
    ]
]

Example Output:

[
  {
    "channel_name": "ThePeteSantilliShow",
    "title": "The damage this caused is now being totaled up",
    "date_published": "2022-05-17T05:00:11+00:00",
    "date_lastupdated": "2022-05-17T05:17:18.540084",
    "channel_url": "https://www.youtube.com/channel/UCXIJgqnII2ZOINSWNOGFThA",
    "source": "youtube.com",
    "url": "https://www.youtube.com/watch?v=bwqBudCzDrQ",
    "duration": 254,
    "description": "",
    "img_src": "https://i3.ytimg.com/vi/bwqBudCzDrQ/hqdefault.jpg",
    "iframe_src": "https://youtube.com/embed/bwqBudCzDrQ",
    "views": 1429
  },
  {
      ...
  }
]

Releases

  • 1.0.6: Fixes Odysee scraper by including image/webp thumbnail format.
  • 1.0.4: Fixes local_now() to be local timezone aware.
  • 1.0.3: Bump
  • 1.0.2: Updates testing
  • 1.0.1: improves command line
  • 1.0.0: Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vidcrawler-1.0.7.tar.gz (27.6 kB view details)

Uploaded Source

Built Distribution

vidcrawler-1.0.7-py2.py3-none-any.whl (37.6 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file vidcrawler-1.0.7.tar.gz.

File metadata

  • Download URL: vidcrawler-1.0.7.tar.gz
  • Upload date:
  • Size: 27.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.9.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.0

File hashes

Hashes for vidcrawler-1.0.7.tar.gz
Algorithm Hash digest
SHA256 36f245d1a239bd25e4cb0e080a4e55e67bb19e49c13c8a1256b1e28673b7690e
MD5 b8a8f33908796abd8de804856fbe4859
BLAKE2b-256 1b14849eb880e12ade166e055173eb805eb88f17ef61a6a39d196e6c7ab95fd2

See more details on using hashes here.

File details

Details for the file vidcrawler-1.0.7-py2.py3-none-any.whl.

File metadata

  • Download URL: vidcrawler-1.0.7-py2.py3-none-any.whl
  • Upload date:
  • Size: 37.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.9.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.0

File hashes

Hashes for vidcrawler-1.0.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a6e93d5f5c8855fcbd202f39261478517e1e1738525f5528e7102a1efd3e04be
MD5 84295ab9a5d7bf2938fbb056d1f679c6
BLAKE2b-256 bec8d5d48568ddc641f6655221e5a9e9cea4cbc47c659c44f4fbf31b497cf1e7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page