Donwload audio from a youtube video and use Gemini LLM for cleaner and smarter transcibes
Project description
minyt
minyt (WIP) is a Python package that simplifies the process of downloading YouTube audio and generating high-quality transcripts using Google’s Gemini AI. It intelligently splits long audio files at natural silence points and processes chunks in parallel for optimal performance.
Features
- YouTube Audio Download: Extract audio from any YouTube video using
yt-dlp - Smart Audio Splitting: Automatically detect silence and split audio at natural break points
- AI-Powered Transcription: Use Google’s Gemini 2.0 Flash for accurate, context-aware transcriptions
- Parallel Processing: Process multiple audio chunks concurrently for faster results
- Customizable: Configure chunk sizes, silence detection, and transcription prompts
- Clean Output: Generate well-formatted transcripts ready for analysis
Quick Start
Installation
pip install minyt
Prerequisites
-
FFmpeg: Required for audio processing
# macOS brew install ffmpeg # Ubuntu/Debian sudo apt update && sudo apt install ffmpeg # Windows # Download from https://ffmpeg.org/download.html
-
Google Gemini API Key: Get your API key from Google AI Studio
export GEMINI_API_KEY="your-api-key-here"
Basic Usage
import asyncio
from pathlib import Path
from minyt.core import *
# Download audio from a YouTube video
video_id = "dQw4w9WgXcQ" # Replace with your video ID
audio_file = download_audio(video_id, Path("_audio"))
# Detect silence and find optimal split points
_, silence_data = detect_silence(audio_file)
silence_ends = parse_silence_ends(silence_data)
total_duration = get_audio_duration(audio_file)
split_points = find_split_points(silence_ends, total_duration, chunk_len=600)
# Split audio into manageable chunks
chunks = split_audio(audio_file, split_points, dest_dir="_audio_chunks")
# Transcribe all chunks using Gemini AI
async def main():
transcript = await transcribe_audio(
chunks_dir="_audio_chunks",
dest_file="_transcripts/transcript.txt",
prompt="Please transcribe this audio file verbatim, maintaining speaker clarity and context."
)
print(f"Transcript saved to: _transcripts/transcript.txt")
asyncio.run(main())
Detailed Usage
Step 1: Download YouTube Audio
from minyt.core import download_audio
from pathlib import Path
# Download audio from a YouTube video
video_id = "your-video-id-here"
audio_file = download_audio(video_id, Path("downloads"))
print(f"Audio downloaded to: {audio_file}")
Step 2: Process Audio with Smart Splitting
from minyt.core import detect_silence, parse_silence_ends, find_split_points, split_audio
# Detect silence in the audio file
_, silence_data = detect_silence(audio_file)
# Parse silence end points
silence_ends = parse_silence_ends(silence_data)
# Find optimal split points (aiming for 10-minute chunks)
total_duration = get_audio_duration(audio_file)
split_points = find_split_points(silence_ends, total_duration, chunk_len=600)
# Split audio into chunks
chunks = split_audio(audio_file, split_points, dest_dir="audio_chunks")
print(f"Created {len(chunks)} audio chunks")
Step 3: Transcribe with Gemini AI
import asyncio
from minyt.core import transcribe_audio
async def transcribe_video():
transcript = await transcribe_audio(
chunks_dir="audio_chunks",
dest_file="transcripts/final_transcript.txt",
model="gemini-2.0-flash-001", # Default model
max_concurrent=3, # Process 3 chunks simultaneously
prompt="Please transcribe this audio accurately, preserving speaker names and technical terms."
)
return transcript
# Run transcription
transcript = asyncio.run(transcribe_video())
print("Transcription completed!")
Configuration
Environment Variables
# Required
export GEMINI_API_KEY="your-gemini-api-key"
# Optional: Configure logging level
export LOG_LEVEL="INFO"
Customization Options
# Custom silence detection (adjust sensitivity)
_, silence_data = detect_silence(audio_file) # Uses -30dB threshold, 0.5s duration
# Custom chunk size (in seconds)
split_points = find_split_points(silence_ends, total_duration, chunk_len=300) # 5-minute chunks
# Custom transcription settings
transcript = await transcribe_audio(
chunks_dir="chunks",
dest_file="output.txt",
model="gemini-2.0-flash-001", # Different Gemini model
max_concurrent=5, # More parallel processing
prompt="Custom transcription instructions here..."
)
Development
Install in Development Mode
# Clone the repository
git clone https://github.com/franckalbinet/minyt.git
cd minyt
# Install in development mode
pip install -e .
# Make changes in the nbs/ directory
# ...
# Compile changes to apply to minyt package
nbdev_prepare
Dependencies
fastcore: Core utilitiesgoogle-genai: Google Gemini AI clientyt-dlp: YouTube video downloaderffmpeg-python: Audio processingtqdm: Progress barsrich: Enhanced console output
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Acknowledgments
- yt-dlp for YouTube video downloading
- Google Gemini for AI-powered transcription
- FFmpeg for audio processing capabilities
Support
If you encounter any issues or have questions:
- Check the documentation
- Open an issue
- Contact the maintainer: franckalbinet@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file minyt-0.0.3.tar.gz.
File metadata
- Download URL: minyt-0.0.3.tar.gz
- Upload date:
- Size: 12.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab61c76e80aa8f0de887158eaec70b9797a07c5e1752d36a4b36b2d541db5b6f
|
|
| MD5 |
364a2b45a87f10ea9cd819f7ea6ed3bc
|
|
| BLAKE2b-256 |
3277b62b16072ebcefbd61ff2485b9b833a72b8b58e5f95ddd5933b17d929402
|
File details
Details for the file minyt-0.0.3-py3-none-any.whl.
File metadata
- Download URL: minyt-0.0.3-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4eedee94977035810dde23ecc35e9de4da3a7ff7aafa813848d036ffc59618a2
|
|
| MD5 |
8689f7f1d84df7e72e6e6509d3ccc61e
|
|
| BLAKE2b-256 |
dec86a02017b6a6e0e62d7d597bc6737ddaea753f3d6a739ab8242ca92dede38
|