A package to crawl a youtube channel
Project description
Crawl YouTube Channel
This Python package provides tools to crawl and extract data from YouTube channels.
Features
- Crawl an entire YouTube channel for video information.
- Extract metadata, comments, transcripts, audio, and video for each video.
- Provides a base class to easily implement your own video processing and storage logic.
- Includes a
Sqlite3YouTubeVideoProcessorfor storing data in a local SQLite database. - Provides data classes for easy access to crawled data.
Prerequisites
- Python 3.10+
- Google Cloud YouTube API Key
Installation
-
Install the package:
pip install crawl-youtube-channel
-
Set up your environment:
Create a
.envfile in your project root and add your Google Cloud YouTube API key:GOOGLE_CLOUD_YOUTUBE_API_KEY=your_api_key
Usage
To use the crawler, you need to implement the YouTubeVideoProcessorBase abstract class. This class defines how to check for existing videos and how to process new ones.
Here is a basic skeleton for a custom processor:
import asyncio
from crawl_youtube_channel import YouTubeVideoProcessorBase, YouTubeVideo
class MyVideoProcessor(YouTubeVideoProcessorBase):
async def check_video(self, video_id: str) -> bool:
# Implement logic to check if the video has already been processed.
# Return True if it exists, False otherwise.
...
async def process_video(self, v: YouTubeVideo) -> None:
# Implement logic to save or process the video data.
# For example, save it to a database, a file, or another service.
...
async def main():
# Initialize your custom processor
processor = MyVideoProcessor()
# Start crawling the channel
await processor.process_channel(channel_url='https://www.youtube.com/@YourFavoriteChannel/videos')
if __name__ == '__main__':
asyncio.run(main())
For a concrete implementation example, see the Sqlite3YouTubeVideoProcessor class in the source code, which stores video data in a SQLite database.
Data Models
The following data classes are used to structure the crawled data:
YouTubeVideo: The main container for all video-related data.YouTubeThumbnail: Basic information about a video thumbnail.YouTubeData: Contains detailed information about a video, including:Meta: Video metadata (title, description, tags, etc.).Comment: A YouTube comment, including replies.Transcript: The video's transcript.audio: The audio file in M4A format (as bytes).video: The video file in MP4 format (as bytes).
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crawl_youtube_channel-0.0.1.tar.gz.
File metadata
- Download URL: crawl_youtube_channel-0.0.1.tar.gz
- Upload date:
- Size: 12.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f3089157e057d2039d75e887942258159cb8148e80a4c84ed5bc92681358320
|
|
| MD5 |
c4cca9f774a289e6903dda744527fa2e
|
|
| BLAKE2b-256 |
030517061614e6b8f16e6406eb0489e4c0b92af3645261d6d519f3057338cee3
|
File details
Details for the file crawl_youtube_channel-0.0.1-py3-none-any.whl.
File metadata
- Download URL: crawl_youtube_channel-0.0.1-py3-none-any.whl
- Upload date:
- Size: 13.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
abe18946c31aa067300dad9fe7418f4bf4ec81e37969daa4faaf0e5c8e9b5628
|
|
| MD5 |
2d89d4aa170b53920b2a829b5ad8d2f0
|
|
| BLAKE2b-256 |
840ba873ca77d09279820f5549fa78e4f966fab58036080c887d8d9f4957ad99
|