Skip to main content

YouTube loader for LangChain using yt-dlp

Project description

langchain-yt-dlp

langchain-yt-dlp is a Python package that extends LangChain by providing an improved YouTube integration using yt-dlp. This package addresses a critical limitation in the existing LangChain YoutubeLoader. The original implementation, which relied on pytube, became unable to fetch YouTube metadata due to changes in YouTube's structure. langchain-yt-dlp resolves this by leveraging the robust yt-dlp library, providing a more reliable YouTube document loader.


Key Features

  • Retrieve metadata (e.g., title, description, author, view count, publish date) using the yt-dlp library.
  • Maintain compatibility with LangChain's existing loader interface.

Installation

To install the package, use:

pip install langchain-yt-dlp

Ensure you have the following dependencies installed:

  • langchain
  • yt-dlp

Install them with:

pip install langchain yt-dlp

Usage

Here’s how you can use the YoutubeLoader from langchain-yt-dlp:

Basic Example

Loading From a YouTube URL

from langchain_yt_dlp.youtube_loader import YoutubeLoaderDL

# Initialize using a YouTube URL
loader = YoutubeLoaderDL.from_youtube_url(
    youtube_url="https://www.youtube.com/watch?v=dQw4w9WgXcQ", 
    add_video_info=True
)

documents = loader.load()
print(documents)

Parameters

YoutubeLoaderDL Constructor

Parameter Type Default Description
video_id str None The YouTube video ID to fetch data for.
add_video_info bool False Whether to fetch additional metadata.

Testing

To run the tests:

  1. Clone the repository:

    git clone https://github.com/aqib0770/langchain-yt-dlp
    cd langchain-yt-dlp
    
  2. Install development dependencies:

    pip install -r requirements.txt
    
  3. Run the tests:

    pytest tests/test_youtube_loader.py
    

Contributing

Contributions are welcome! If you have ideas for new features or spot a bug, feel free to:

  • Open an issue on GitHub.
  • Submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.


Acknowledgements

  • LangChain for providing the base integration framework.
  • yt-dlp for enabling enhanced YouTube metadata extraction.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_yt_dlp-0.0.8.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

langchain_yt_dlp-0.0.8-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file langchain_yt_dlp-0.0.8.tar.gz.

File metadata

  • Download URL: langchain_yt_dlp-0.0.8.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.16

File hashes

Hashes for langchain_yt_dlp-0.0.8.tar.gz
Algorithm Hash digest
SHA256 10f77ad8ca86dcaf9d94a118eed26999e63071b543f6a765da10daa773001e43
MD5 72a965edd7640d6a75e54e8b92436f1d
BLAKE2b-256 0c90f09dde067ea4c836a3f4af83307310cc2624e8690f5ce3d2f78e6d525d21

See more details on using hashes here.

File details

Details for the file langchain_yt_dlp-0.0.8-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_yt_dlp-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 8226fd95edbf3cc70607640b50c3e09f407a113a085f25c0f7575ed9602f4098
MD5 5edb0ffaff0c2d558f3a01af726d052b
BLAKE2b-256 12c1e4721f6381e16dd69f1f119fcefa27666276d90d24ac9e8b4494cf550c27

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page