Skip to main content

Video Crawler

Project description

vidcrawler

Crawls major videos sites like YouTube/Rumble/Bitchute/Brighteon for video content and outputs a json feed of all the videos that were found.

Platform Unit Tests

Actions Status Actions Status Actions Status

API

Command line

vidcrawler --input_crawl_json "fetch_list.json" --output_json "out_list.json"

Python

import json
from vidcrawler import crawl_video_sites

crawl_list = [
    ["channel name", "source", "channel_id"]
]
output = crawl_video_sites(crawl_list)
print(json.dumps(output))

"source" and "channel_id" are used to generate the video-platform-specific urls to fetch data. The "channel name" is echo'd back in the generated json feeds, but doesn't not affect the fetching process in any way.

Testing

Install vidcrawler and then the command vidcralwer_test will become available.

$ pip install vidcrawler
$ vidcrawler_test

Example input fetch_list.json

[
    [
        "Health Ranger Report",
        "brighteon",
        "hrreport"
    ],
    [
        "Sydney Watson",
        "youtube",
        "UCSFy-1JrpZf0tFlRZfo-Rvw"
    ],
    [
        "Computing Forever",
        "bitchute",
        "hybm74uihjkf"
    ],
    [
        "ThePeteSantilliShow",
        "rumble",
        "ThePeteSantilliShow"
    ],
    [
        "Macroaggressions",
        "odysee",
        "Macroaggressions"
    ]
]

Example Output:

[
  {
    "channel_name": "ThePeteSantilliShow",
    "title": "The damage this caused is now being totaled up",
    "date_published": "2022-05-17T05:00:11+00:00",
    "date_lastupdated": "2022-05-17T05:17:18.540084",
    "channel_url": "https://www.youtube.com/channel/UCXIJgqnII2ZOINSWNOGFThA",
    "source": "youtube.com",
    "url": "https://www.youtube.com/watch?v=bwqBudCzDrQ",
    "duration": 254,
    "description": "",
    "img_src": "https://i3.ytimg.com/vi/bwqBudCzDrQ/hqdefault.jpg",
    "iframe_src": "https://youtube.com/embed/bwqBudCzDrQ",
    "views": 1429
  },
  {
      ...
  }
]

Releases

  • 1.0.9: Fixes crawler for rumble and minor fixes + linting fixes.
  • 1.0.8: Readme correction.
  • 1.0.7: Fixes Odysee scraper by including image/webp thumbnail format.
  • 1.0.4: Fixes local_now() to be local timezone aware.
  • 1.0.3: Bump
  • 1.0.2: Updates testing
  • 1.0.1: improves command line
  • 1.0.0: Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vidcrawler-1.0.9.tar.gz (27.4 kB view details)

Uploaded Source

Built Distribution

vidcrawler-1.0.9-py2.py3-none-any.whl (36.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file vidcrawler-1.0.9.tar.gz.

File metadata

  • Download URL: vidcrawler-1.0.9.tar.gz
  • Upload date:
  • Size: 27.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for vidcrawler-1.0.9.tar.gz
Algorithm Hash digest
SHA256 66aa349e63f4549b2a26f6d9589eb6018934446450ba90ea5e877f337d204271
MD5 845ddd5d61aa67d0d1eba49e62edc46e
BLAKE2b-256 ac0dffbf66b17901d50ed85efa4cfdcdbdfbdc08e8c41fa0f6de5a23c1691507

See more details on using hashes here.

File details

Details for the file vidcrawler-1.0.9-py2.py3-none-any.whl.

File metadata

  • Download URL: vidcrawler-1.0.9-py2.py3-none-any.whl
  • Upload date:
  • Size: 36.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for vidcrawler-1.0.9-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 9d3b7ab5d7b0bed2813dc84c1950bd5edc38c614d7b8491ac02e521e659d6f7f
MD5 9e5cbd185c6444f1c2cd5f3507e74b74
BLAKE2b-256 81ed0130c484381a057151208db334857f0b4441b4588602043debdd519ec8e2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page