Video Crawler
Project description
vidcrawler
Crawls major videos sites like YouTube/Rumble/Bitchute/Brighteon for video content and outputs a json feed of all the videos that were found.
Platform Unit Tests
Scraper Tests
Note that bitchute doesn't like the github runner's IP and will fail with a 403 forbidden.
API
Command line
vidcrawler --input_crawl_json "fetch_list.json" --output_json "out_list.json"
Python
import json
from vidcrawler import crawl_video_sites
crawl_list = [
[
"Computing Forever", # Can be whatever you want.
"bitchute", # Must be "youtube", "rumble", "bitchute" (and others).
"hybm74uihjkf" # The channel id on the service.
]
]
output = crawl_video_sites(crawl_list)
print(json.dumps(output))
"source" and "channel_id" are used to generate the video-platform-specific urls to fetch data. The "channel name" is echo'd back in the generated json feeds, but doesn't not affect the fetching process in any way.
Testing
Install vidcrawler and then the command vidcralwer_test will become available.
> pip install vidcrawler
> vidcrawler_test
youtube-pull-channel
This new command will a channel and all of it's files as mp3s. Great for transcribing and putting into an LLM.
Example input fetch_list.json
[
[
"Health Ranger Report",
"brighteon",
"hrreport"
],
[
"Sydney Watson",
"youtube",
"UCSFy-1JrpZf0tFlRZfo-Rvw"
],
[
"Computing Forever",
"bitchute",
"hybm74uihjkf"
],
[
"ThePeteSantilliShow",
"rumble",
"ThePeteSantilliShow"
],
[
"Macroaggressions",
"odysee",
"Macroaggressions"
]
]
Example Output:
[
{
"channel_name": "ThePeteSantilliShow",
"title": "The damage this caused is now being totaled up",
"date_published": "2022-05-17T05:00:11+00:00",
"date_lastupdated": "2022-05-17T05:17:18.540084",
"channel_url": "https://www.youtube.com/channel/UCXIJgqnII2ZOINSWNOGFThA",
"source": "youtube.com",
"url": "https://www.youtube.com/watch?v=bwqBudCzDrQ",
"duration": 254,
"description": "",
"img_src": "https://i3.ytimg.com/vi/bwqBudCzDrQ/hqdefault.jpg",
"iframe_src": "https://youtube.com/embed/bwqBudCzDrQ",
"views": 1429
},
{
"channel_name": "ThePeteSantilliShow",
"title": "..."
}
]
Releases
- 1.0.16: Make the library downloading more robust.
- 1.0.15: Improve cleaning filepaths for brighteon_bot
- 1.0.13: New
brighteon-pull-channelcommand - 1.0.11: Improves
youtube-pull-channel - 1.0.10: Adds
youtube-pull-channelwhich pulls all files down as mp3s for a channel. - 1.0.9: Fixes crawler for rumble and minor fixes + linting fixes.
- 1.0.8: Readme correction.
- 1.0.7: Fixes Odysee scraper by including image/webp thumbnail format.
- 1.0.4: Fixes local_now() to be local timezone aware.
- 1.0.3: Bump
- 1.0.2: Updates testing
- 1.0.1: improves command line
- 1.0.0: Initial release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vidcrawler-1.0.16.tar.gz.
File metadata
- Download URL: vidcrawler-1.0.16.tar.gz
- Upload date:
- Size: 36.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ecdfd7de2d9edbefc1b5be180d44bb89f7ed2f7bd8031aeeac57a93d7b83f566
|
|
| MD5 |
1c4a838f173461c3d4008087a0c184c5
|
|
| BLAKE2b-256 |
76fdae4ed313e0d15cf49dce7035ad4d786dbef4e98820f71a8528fd67b89399
|
File details
Details for the file vidcrawler-1.0.16-py2.py3-none-any.whl.
File metadata
- Download URL: vidcrawler-1.0.16-py2.py3-none-any.whl
- Upload date:
- Size: 47.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6177830808e9a9c5d85437ac91c0d6cd0a48f2a8be2c396474ac01b141128976
|
|
| MD5 |
0c101f6108d59e44389e0b5340732784
|
|
| BLAKE2b-256 |
c356abb2c659199bdab89471f27689a3080c638b9f4aaf6b699797a19cce6909
|