Skip to main content

Video Crawler

Project description

vidcrawler

Crawls major videos sites like YouTube/Rumble/Bitchute/Brighteon for video content and outputs a json feed of all the videos that were found.

Platform Unit Tests

MacOS_Tests Win_Tests Ubuntu_Tests

Scraper Tests

Scaper_Youtube Actions Status Scaper_Brighteon Actions Status Actions Status

Note that bitchute doesn't like the github runner's IP and will fail with a 403 forbidden. Actions Status

API

Command line

vidcrawler --input_crawl_json "fetch_list.json" --output_json "out_list.json"

Python

import json
from vidcrawler import crawl_video_sites
crawl_list = [
    [
        "Computing Forever",  # Can be whatever you want.
        "bitchute",  # Must be "youtube", "rumble", "bitchute" (and others).
        "hybm74uihjkf"  # The channel id on the service.
    ]
]
output = crawl_video_sites(crawl_list)
print(json.dumps(output))

"source" and "channel_id" are used to generate the video-platform-specific urls to fetch data. The "channel name" is echo'd back in the generated json feeds, but doesn't not affect the fetching process in any way.

Testing

Install vidcrawler and then the command vidcralwer_test will become available.

> pip install vidcrawler
> vidcrawler_test

youtube-pull-channel

This new command will a channel and all of it's files as mp3s. Great for transcribing and putting into an LLM.

Example input fetch_list.json

[
    [
        "Health Ranger Report",
        "brighteon",
        "hrreport"
    ],
    [
        "Sydney Watson",
        "youtube",
        "UCSFy-1JrpZf0tFlRZfo-Rvw"
    ],
    [
        "Computing Forever",
        "bitchute",
        "hybm74uihjkf"
    ],
    [
        "ThePeteSantilliShow",
        "rumble",
        "ThePeteSantilliShow"
    ],
    [
        "Macroaggressions",
        "odysee",
        "Macroaggressions"
    ]
]

Example Output:

[
  {
    "channel_name": "ThePeteSantilliShow",
    "title": "The damage this caused is now being totaled up",
    "date_published": "2022-05-17T05:00:11+00:00",
    "date_lastupdated": "2022-05-17T05:17:18.540084",
    "channel_url": "https://www.youtube.com/channel/UCXIJgqnII2ZOINSWNOGFThA",
    "source": "youtube.com",
    "url": "https://www.youtube.com/watch?v=bwqBudCzDrQ",
    "duration": 254,
    "description": "",
    "img_src": "https://i3.ytimg.com/vi/bwqBudCzDrQ/hqdefault.jpg",
    "iframe_src": "https://youtube.com/embed/bwqBudCzDrQ",
    "views": 1429
  },
  {
     "channel_name": "ThePeteSantilliShow",
     "title": "..."
  }
]

Releases

  • 1.0.39: More pinned deps problems fixed.
  • 1.0.38: One of the scrapers has a pinned dependency, install it with [full]
  • 1.0.37: Misc fixes.
  • 1.0.36: Fixed youtube, rumble and brighteon parsers. Bitchute is still broken and now has rate limits.
  • 1.0.35: Added update_yt_dlp() to allow the client to update the downloader.
  • 1.0.34: Upgraded open-webdriver to version 1.5.0 to avoid yt-dlp urllib incompatibility.
  • 1.0.28: youtube_pull now takes in --channel-name and --output, like the other pullers
  • 1.0.27: Fixed polluting path space from multiple added static-ffmpeg
  • 1.0.24: Added rumble-pull-channel
  • 1.0.21: Misc fixes
  • 1.0.16: Make the library downloading more robust.
  • 1.0.15: Improve cleaning filepaths for brighteon_bot
  • 1.0.13: New brighteon-pull-channel command
  • 1.0.11: Improves youtube-pull-channel
  • 1.0.10: Adds youtube-pull-channel which pulls all files down as mp3s for a channel.
  • 1.0.9: Fixes crawler for rumble and minor fixes + linting fixes.
  • 1.0.8: Readme correction.
  • 1.0.7: Fixes Odysee scraper by including image/webp thumbnail format.
  • 1.0.4: Fixes local_now() to be local timezone aware.
  • 1.0.3: Bump
  • 1.0.2: Updates testing
  • 1.0.1: improves command line
  • 1.0.0: Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vidcrawler-1.0.39.tar.gz (46.6 kB view details)

Uploaded Source

Built Distribution

vidcrawler-1.0.39-py2.py3-none-any.whl (60.3 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file vidcrawler-1.0.39.tar.gz.

File metadata

  • Download URL: vidcrawler-1.0.39.tar.gz
  • Upload date:
  • Size: 46.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.5

File hashes

Hashes for vidcrawler-1.0.39.tar.gz
Algorithm Hash digest
SHA256 f16b41c3f45803c7ca0106aa5f523e73ab85d1a14d897935223f95329d805bc3
MD5 a40a2158048fecfa298a85fc5d2c817d
BLAKE2b-256 80a288fc4e4e107cbf01170a00a161ec3974b77b07ac8d27d2af8813179b0bae

See more details on using hashes here.

File details

Details for the file vidcrawler-1.0.39-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for vidcrawler-1.0.39-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 4f2aa97bda48a5b66d9dff48a61a99f49e01bddd6e770d922af9c7e8c6cd8924
MD5 67248c8cf26816434cf059d5fa340791
BLAKE2b-256 adf5534291f1786f0ca5e46296be55d15c9477890065479aaf3d8dc5d29d5758

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page