Multi-agent AI system for video transcription using OpenAI Whisper, ChromaDB, and AutoGen
Project description
๐ฌ Video Transcription Agent
Multi-agent AI system for video transcription using OpenAI Whisper, ChromaDB, and AutoGen
Transform your videos into accurate transcriptions with an intelligent multi-agent system that combines the power of OpenAI Whisper, persistent memory with ChromaDB, and advanced agent coordination.
โจ Key Features
๐ค Multi-Agent Architecture
- Intelligent Coordination: Specialized agents work together seamlessly
- Video Analysis: Extract metadata, duration, and quality information
- Audio Transcription: High-accuracy speech-to-text using OpenAI Whisper
- Text Processing: Automatic punctuation, formatting, and error correction
- Output Formatting: Multiple formats (SRT, VTT, TXT, JSON, CSV, DOCX)
- Quality Assurance: Built-in validation and quality assessment
๐ง Persistent Memory with ChromaDB
- Semantic Search: Find content using natural language queries
- Session History: Keep track of all transcriptions across sessions
- Metadata Storage: Rich information about videos and processing
- Trend Analysis: Usage statistics and pattern recognition
๐ฌ Interactive CLI Experience
- Intuitive Commands: Simple, English-based command interface
- Auto-completion: Tab completion for commands and file paths
- Progress Indicators: Real-time progress bars and status updates
- Rich Interface: Beautiful colors and formatting with Rich library
- Error Handling: Graceful error management and recovery
๐ฏ Advanced Capabilities
- Multiple Video Formats: MP4, AVI, MOV, MKV, FLV, WMV, WebM, M4V
- Language Detection: Automatic or manual language selection
- Precise Timestamps: Word-level timing accuracy
- Batch Processing: Handle multiple videos simultaneously
- Quality Optimization: Automatic recommendations for better results
๐ Quick Start
Installation
# Install from PyPI
pip install ai-agent-video-transcription
# Or install from source
git clone https://github.com/lopand-solutions/video-transcription-agent.git
cd video-transcription-agent
pip install -e .
Prerequisites
- Python 3.9+
- FFmpeg (for audio/video processing)
# Install FFmpeg
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg
# Windows (using Chocolatey)
choco install ffmpeg
Basic Usage
Interactive CLI (Recommended)
# Start the interactive CLI
vt-cli
# Or use the full command
video-transcription-agent
Available Commands:
๐ Main Commands:
โข discovery - Find videos in current directory
โข select <ID> - Select a video from discovery results
โข transcribe - Transcribe video file (always in Spanish)
โข analyze - Analyze video metadata
โข search - Search in saved transcriptions
โข history - Show transcription history
โข status - Show system status
โข help - Show all commands
โข exit - Exit the program
Command Line Usage
# Transcribe a video file
vt-cli transcribe path/to/video.mp4
# Analyze video metadata
vt-cli analyze path/to/video.mp4
# Search in transcriptions
vt-cli search "artificial intelligence"
# Show system status
vt-cli status
Programmatic Usage
import asyncio
from ai_agent_video_transcription import CoordinatorAgent
async def transcribe_video():
# Configuration
config = {
'transcriber': {
'model_size': 'base', # tiny, base, small, medium, large
'language': 'es' # Force Spanish transcription
},
'formatter': {
'output_directory': './output'
},
'output_formats': ['txt', 'srt']
}
# Create coordinator
coordinator = CoordinatorAgent(config)
# Execute transcription
result = await coordinator.execute({
'video_path': 'path/to/video.mp4',
'output_name': 'my_transcription'
})
return result
# Run transcription
result = asyncio.run(transcribe_video())
print(f"Transcription completed: {result['final_outputs']}")
๐ Documentation
Configuration
Environment Variables
# Optional: Set OpenAI API key for advanced features
export OPENAI_API_KEY="your_api_key_here"
# Optional: Custom ChromaDB directory
export CHROMADB_PERSIST_DIRECTORY="./my_chroma_db"
Configuration File
Create a config.yaml file for advanced configuration:
transcriber:
model_size: "base" # tiny, base, small, medium, large, large-v2, large-v3
language: "es" # Force Spanish, or null for auto-detection
temperature: 0.0
formatter:
output_directory: "./output"
supported_formats: ["txt", "srt", "vtt", "json"]
memory:
persist_sessions: true
session_timeout_minutes: 60
Supported Formats
Input Video Formats
- MP4, AVI, MOV, MKV, FLV, WMV, WebM, M4V
Output Formats
- TXT: Plain text transcription
- SRT: SubRip subtitle format
- VTT: WebVTT subtitle format
- JSON: Structured data with timestamps
- CSV: Comma-separated values
- DOCX: Microsoft Word document
Whisper Models
| Model | Size | Speed | Accuracy | Use Case |
|---|---|---|---|---|
| tiny | 39M | Fastest | Basic | Quick testing |
| base | 74M | Fast | Good | Recommended |
| small | 244M | Medium | Better | Balanced |
| medium | 769M | Slow | High | Quality focus |
| large | 1550M | Slowest | Highest | Best quality |
๐๏ธ Architecture
Video Transcription Agent
โโโ ๐ค Multi-Agent System
โ โโโ Coordinator Agent (orchestrates workflow)
โ โโโ Analyzer Agent (video metadata extraction)
โ โโโ Transcriber Agent (Whisper-based transcription)
โ โโโ Processor Agent (text enhancement)
โ โโโ Formatter Agent (output generation)
โโโ ๐ง Memory System (ChromaDB)
โ โโโ Persistent storage
โ โโโ Semantic search
โ โโโ Session management
โโโ ๐ฌ CLI Interface
โโโ Interactive commands
โโโ Progress indicators
โโโ Error handling
๐ง Advanced Usage
Batch Processing
# Process multiple videos
videos = ['video1.mp4', 'video2.mp4', 'video3.mp4']
results = await coordinator.execute_batch(videos, 'batch_output')
Custom Memory Queries
# Search in stored transcriptions
memory = MemoryManager()
results = memory.search_transcriptions("machine learning", limit=10)
Quality Analysis
# Analyze video quality before transcription
analyzer = AnalyzerAgent()
metadata = await analyzer.execute({'video_path': 'video.mp4'})
print(f"Quality Score: {metadata['quality_score']}")
๐ Troubleshooting
Common Issues
FFmpeg not found:
# Verify installation
ffmpeg -version
# Install if missing
brew install ffmpeg # macOS
sudo apt install ffmpeg # Ubuntu
Memory issues with large models:
- Use smaller models:
tiny,base,small - Process videos in smaller segments
- Increase system RAM or use GPU acceleration
Transcription in wrong language:
- Set language explicitly:
language: "es"in config - Check audio quality and clarity
- Try different Whisper models
Performance Optimization
- GPU Acceleration: Install CUDA-compatible PyTorch for faster processing
- Model Caching: Models are cached automatically after first use
- Batch Processing: Process multiple videos in sequence for efficiency
๐ค Contributing
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- OpenAI Whisper - Speech recognition models
- ChromaDB - Vector database for semantic search
- AutoGen - Multi-agent framework
- Rich - Beautiful terminal interfaces
- FFmpeg - Multimedia processing
๐ Support
- ๐ง Email: contact@lopandsolutions.com
- ๐ Issues: GitHub Issues
- ๐ Documentation: Project Wiki
๐ฌ Transform your videos into accurate transcriptions with AI! ๐
Made with โค๏ธ by Lopand Solutions
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_agent_video_transcription-1.1.3.tar.gz.
File metadata
- Download URL: ai_agent_video_transcription-1.1.3.tar.gz
- Upload date:
- Size: 60.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d74a38a01c225af9bcd9fae7951cbeef15f3236bda7a7d305f59d9f8bc09f3d9
|
|
| MD5 |
f27bd52c35cd37c9dc618d3a4293d110
|
|
| BLAKE2b-256 |
236da0609e8f10874a74745d8e066ed9170edd9c6009d771b7633c42033cb51f
|
File details
Details for the file ai_agent_video_transcription-1.1.3-py3-none-any.whl.
File metadata
- Download URL: ai_agent_video_transcription-1.1.3-py3-none-any.whl
- Upload date:
- Size: 59.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51d41c59c01ce99c2078d1b746800597d346c2da54bed0661f215b7fc7f31a62
|
|
| MD5 |
6c213f2f164c5a6e03da80e8fa76bbf3
|
|
| BLAKE2b-256 |
0656d1f78cec2b375f5d7f6126991417a8510495d4753accf2d26b5a5c73788b
|