A package to crawl a youtube channel

These details have not been verified by PyPI

Project description

Crawl YouTube Channel

This Python package provides tools to crawl and extract data from YouTube channels.

Features

Crawl an entire YouTube channel for video information.
Extract metadata, comments, transcripts, audio, and video for each video.
Provides a base class to easily implement your own video processing and storage logic.
Includes a Sqlite3YouTubeVideoProcessor for storing data in a local SQLite database.
Provides data classes for easy access to crawled data.

Prerequisites

Python 3.10+
Google Cloud YouTube API Key

Installation

Install the package:
```
pip install crawl-youtube-channel
```
Set up your environment:

Create a .env file in your project root and add your Google Cloud YouTube API key:
```
GOOGLE_CLOUD_YOUTUBE_API_KEY=your_api_key
```

Usage

To use the crawler, you need to implement the YouTubeVideoProcessorBase abstract class. This class defines how to check for existing videos and how to process new ones.

Here is a basic skeleton for a custom processor:

import asyncio
from crawl_youtube_channel import YouTubeVideoProcessorBase, YouTubeVideo

class MyVideoProcessor(YouTubeVideoProcessorBase):
    async def check_video(self, video_id: str) -> bool:
        # Implement logic to check if the video has already been processed.
        # Return True if it exists, False otherwise.
        ...

    async def process_video(self, v: YouTubeVideo) -> None:
        # Implement logic to save or process the video data.
        # For example, save it to a database, a file, or another service.
        ...

async def main():
    # Initialize your custom processor
    processor = MyVideoProcessor()

    # Start crawling the channel
    await processor.process_channel(channel_url='https://www.youtube.com/@YourFavoriteChannel/videos')

if __name__ == '__main__':
    asyncio.run(main())

For a concrete implementation example, see the Sqlite3YouTubeVideoProcessor class in the source code, which stores video data in a SQLite database.

Data Models

The following data classes are used to structure the crawled data:

YouTubeVideo: The main container for all video-related data.
YouTubeThumbnail: Basic information about a video thumbnail.
YouTubeData: Contains detailed information about a video, including:
- Meta: Video metadata (title, description, tags, etc.).
- Comment: A YouTube comment, including replies.
- Transcript: The video's transcript.
- audio: The audio file in M4A format (as bytes).
- video: The video file in MP4 format (as bytes).

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.0.1

Jul 11, 2025

This version

0.0.1a2 pre-release

Jul 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawl_youtube_channel-0.0.1a2.tar.gz (12.2 kB view details)

Uploaded Jul 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crawl_youtube_channel-0.0.1a2-py3-none-any.whl (13.9 kB view details)

Uploaded Jul 11, 2025 Python 3

File details

Details for the file crawl_youtube_channel-0.0.1a2.tar.gz.

File metadata

Download URL: crawl_youtube_channel-0.0.1a2.tar.gz
Upload date: Jul 11, 2025
Size: 12.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for crawl_youtube_channel-0.0.1a2.tar.gz
Algorithm	Hash digest
SHA256	`f4758ca254c9ef03f33abbca0f5fbb7153dc191c88f4c8177d38b6123321da17`
MD5	`38816431eb1eae7388d2eb8f66bd39bf`
BLAKE2b-256	`19f4289a7da9ce45d4d301de25ec9a66350b5587a6ae3f0cf252ced3f8c3b0ca`

See more details on using hashes here.

File details

Details for the file crawl_youtube_channel-0.0.1a2-py3-none-any.whl.

File metadata

Download URL: crawl_youtube_channel-0.0.1a2-py3-none-any.whl
Upload date: Jul 11, 2025
Size: 13.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for crawl_youtube_channel-0.0.1a2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8824550725f9e713deb07979c16a50e073157980b14ec80e4de27c85e52b5b4e`
MD5	`6b2db9ed73d211aafc1146515d95c5df`
BLAKE2b-256	`8524eca50177bb1bd69060cb5f3281dcc741e11b114d3059e2b0ca62cafe3c76`

See more details on using hashes here.

crawl-youtube-channel 0.0.1a2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Crawl YouTube Channel

Features

Prerequisites

Installation

Usage

Data Models

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes