Skip to main content

Video Crawler

Project description

vidcrawler

Crawls major videos sites like YouTube/Rumble/Bitchute/Brighteon for video content and outputs a json feed of all the videos that were found.

Platform Unit Tests

Actions Status Actions Status Actions Status

Scraper Tests

Actions Status Actions Status Actions Status Actions Status

Note that bitchute doesn't like the github runner's IP and will fail with a 403 forbidden. Actions Status

API

Command line

vidcrawler --input_crawl_json "fetch_list.json" --output_json "out_list.json"

Python

import json
from vidcrawler import crawl_video_sites
crawl_list = [
    [
        "Computing Forever",  # Can be whatever you want.
        "bitchute",  # Must be "youtube", "rumble", "bitchute" (and others).
        "hybm74uihjkf"  # The channel id on the service.
    ]
]
output = crawl_video_sites(crawl_list)
print(json.dumps(output))

"source" and "channel_id" are used to generate the video-platform-specific urls to fetch data. The "channel name" is echo'd back in the generated json feeds, but doesn't not affect the fetching process in any way.

Testing

Install vidcrawler and then the command vidcralwer_test will become available.

> pip install vidcrawler
> vidcrawler_test

youtube-pull-channel

This new command will a channel and all of it's files as mp3s. Great for transcribing and putting into an LLM.

Example input fetch_list.json

[
    [
        "Health Ranger Report",
        "brighteon",
        "hrreport"
    ],
    [
        "Sydney Watson",
        "youtube",
        "UCSFy-1JrpZf0tFlRZfo-Rvw"
    ],
    [
        "Computing Forever",
        "bitchute",
        "hybm74uihjkf"
    ],
    [
        "ThePeteSantilliShow",
        "rumble",
        "ThePeteSantilliShow"
    ],
    [
        "Macroaggressions",
        "odysee",
        "Macroaggressions"
    ]
]

Example Output:

[
  {
    "channel_name": "ThePeteSantilliShow",
    "title": "The damage this caused is now being totaled up",
    "date_published": "2022-05-17T05:00:11+00:00",
    "date_lastupdated": "2022-05-17T05:17:18.540084",
    "channel_url": "https://www.youtube.com/channel/UCXIJgqnII2ZOINSWNOGFThA",
    "source": "youtube.com",
    "url": "https://www.youtube.com/watch?v=bwqBudCzDrQ",
    "duration": 254,
    "description": "",
    "img_src": "https://i3.ytimg.com/vi/bwqBudCzDrQ/hqdefault.jpg",
    "iframe_src": "https://youtube.com/embed/bwqBudCzDrQ",
    "views": 1429
  },
  {
     "channel_name": "ThePeteSantilliShow",
     "title": "..."
  }
]

Releases

  • 1.0.16: Make the library downloading more robust.
  • 1.0.15: Improve cleaning filepaths for brighteon_bot
  • 1.0.13: New brighteon-pull-channel command
  • 1.0.11: Improves youtube-pull-channel
  • 1.0.10: Adds youtube-pull-channel which pulls all files down as mp3s for a channel.
  • 1.0.9: Fixes crawler for rumble and minor fixes + linting fixes.
  • 1.0.8: Readme correction.
  • 1.0.7: Fixes Odysee scraper by including image/webp thumbnail format.
  • 1.0.4: Fixes local_now() to be local timezone aware.
  • 1.0.3: Bump
  • 1.0.2: Updates testing
  • 1.0.1: improves command line
  • 1.0.0: Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vidcrawler-1.0.16.tar.gz (36.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vidcrawler-1.0.16-py2.py3-none-any.whl (47.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file vidcrawler-1.0.16.tar.gz.

File metadata

  • Download URL: vidcrawler-1.0.16.tar.gz
  • Upload date:
  • Size: 36.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for vidcrawler-1.0.16.tar.gz
Algorithm Hash digest
SHA256 ecdfd7de2d9edbefc1b5be180d44bb89f7ed2f7bd8031aeeac57a93d7b83f566
MD5 1c4a838f173461c3d4008087a0c184c5
BLAKE2b-256 76fdae4ed313e0d15cf49dce7035ad4d786dbef4e98820f71a8528fd67b89399

See more details on using hashes here.

File details

Details for the file vidcrawler-1.0.16-py2.py3-none-any.whl.

File metadata

  • Download URL: vidcrawler-1.0.16-py2.py3-none-any.whl
  • Upload date:
  • Size: 47.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for vidcrawler-1.0.16-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 6177830808e9a9c5d85437ac91c0d6cd0a48f2a8be2c396474ac01b141128976
MD5 0c101f6108d59e44389e0b5340732784
BLAKE2b-256 c356abb2c659199bdab89471f27689a3080c638b9f4aaf6b699797a19cce6909

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page