Skip to main content

Multi-agent AI system for video transcription using OpenAI Whisper, ChromaDB, and AutoGen

Project description

๐ŸŽฌ Video Transcription Agent

Python Version License: MIT PyPI version Downloads

Multi-agent AI system for video transcription using OpenAI Whisper, ChromaDB, and AutoGen

Transform your videos into accurate transcriptions with an intelligent multi-agent system that combines the power of OpenAI Whisper, persistent memory with ChromaDB, and advanced agent coordination.

โœจ Key Features

๐Ÿค– Multi-Agent Architecture

  • Intelligent Coordination: Specialized agents work together seamlessly
  • Video Analysis: Extract metadata, duration, and quality information
  • Audio Transcription: High-accuracy speech-to-text using OpenAI Whisper
  • Text Processing: Automatic punctuation, formatting, and error correction
  • Output Formatting: Multiple formats (SRT, VTT, TXT, JSON, CSV, DOCX)
  • Quality Assurance: Built-in validation and quality assessment

๐Ÿง  Persistent Memory with ChromaDB

  • Semantic Search: Find content using natural language queries
  • Session History: Keep track of all transcriptions across sessions
  • Metadata Storage: Rich information about videos and processing
  • Trend Analysis: Usage statistics and pattern recognition

๐Ÿ’ฌ Interactive CLI Experience

  • Intuitive Commands: Simple, English-based command interface
  • Auto-completion: Tab completion for commands and file paths
  • Progress Indicators: Real-time progress bars and status updates
  • Rich Interface: Beautiful colors and formatting with Rich library
  • Error Handling: Graceful error management and recovery

๐ŸŽฏ Advanced Capabilities

  • Multiple Video Formats: MP4, AVI, MOV, MKV, FLV, WMV, WebM, M4V
  • Language Detection: Automatic or manual language selection
  • Precise Timestamps: Word-level timing accuracy
  • Batch Processing: Handle multiple videos simultaneously
  • Quality Optimization: Automatic recommendations for better results

๐Ÿš€ Quick Start

Installation

# Install from PyPI
pip install ai-agent-video-transcription

# Or install from source
git clone https://github.com/lopand-solutions/video-transcription-agent.git
cd video-transcription-agent
pip install -e .

Prerequisites

  • Python 3.9+
  • FFmpeg (for audio/video processing)
# Install FFmpeg
# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# Windows (using Chocolatey)
choco install ffmpeg

Basic Usage

Interactive CLI (Recommended)

# Start the interactive CLI
vt-cli

# Or use the full command
video-transcription-agent

Available Commands:

๐Ÿš€ Main Commands:

  โ€ข discovery      - Find videos in current directory
  โ€ข select <ID>   - Select a video from discovery results
  โ€ข transcribe    - Transcribe video file (always in Spanish)
  โ€ข analyze       - Analyze video metadata
  โ€ข search        - Search in saved transcriptions
  โ€ข history       - Show transcription history
  โ€ข status        - Show system status
  โ€ข help          - Show all commands
  โ€ข exit          - Exit the program

Command Line Usage

# Transcribe a video file
vt-cli transcribe path/to/video.mp4

# Analyze video metadata
vt-cli analyze path/to/video.mp4

# Search in transcriptions
vt-cli search "artificial intelligence"

# Show system status
vt-cli status

Programmatic Usage

import asyncio
from ai_agent_video_transcription import CoordinatorAgent

async def transcribe_video():
    # Configuration
    config = {
        'transcriber': {
            'model_size': 'base',  # tiny, base, small, medium, large
            'language': 'es'       # Force Spanish transcription
        },
        'formatter': {
            'output_directory': './output'
        },
        'output_formats': ['txt', 'srt']
    }
    
    # Create coordinator
    coordinator = CoordinatorAgent(config)
    
    # Execute transcription
    result = await coordinator.execute({
        'video_path': 'path/to/video.mp4',
        'output_name': 'my_transcription'
    })
    
    return result

# Run transcription
result = asyncio.run(transcribe_video())
print(f"Transcription completed: {result['final_outputs']}")

๐Ÿ“– Documentation

Configuration

Environment Variables

# Optional: Set OpenAI API key for advanced features
export OPENAI_API_KEY="your_api_key_here"

# Optional: Custom ChromaDB directory
export CHROMADB_PERSIST_DIRECTORY="./my_chroma_db"

Configuration File

Create a config.yaml file for advanced configuration:

transcriber:
  model_size: "base"  # tiny, base, small, medium, large, large-v2, large-v3
  language: "es"      # Force Spanish, or null for auto-detection
  temperature: 0.0

formatter:
  output_directory: "./output"
  supported_formats: ["txt", "srt", "vtt", "json"]

memory:
  persist_sessions: true
  session_timeout_minutes: 60

Supported Formats

Input Video Formats

  • MP4, AVI, MOV, MKV, FLV, WMV, WebM, M4V

Output Formats

  • TXT: Plain text transcription
  • SRT: SubRip subtitle format
  • VTT: WebVTT subtitle format
  • JSON: Structured data with timestamps
  • CSV: Comma-separated values
  • DOCX: Microsoft Word document

Whisper Models

Model Size Speed Accuracy Use Case
tiny 39M Fastest Basic Quick testing
base 74M Fast Good Recommended
small 244M Medium Better Balanced
medium 769M Slow High Quality focus
large 1550M Slowest Highest Best quality

๐Ÿ—๏ธ Architecture

Video Transcription Agent
โ”œโ”€โ”€ ๐Ÿค– Multi-Agent System
โ”‚   โ”œโ”€โ”€ Coordinator Agent (orchestrates workflow)
โ”‚   โ”œโ”€โ”€ Analyzer Agent (video metadata extraction)
โ”‚   โ”œโ”€โ”€ Transcriber Agent (Whisper-based transcription)
โ”‚   โ”œโ”€โ”€ Processor Agent (text enhancement)
โ”‚   โ””โ”€โ”€ Formatter Agent (output generation)
โ”œโ”€โ”€ ๐Ÿง  Memory System (ChromaDB)
โ”‚   โ”œโ”€โ”€ Persistent storage
โ”‚   โ”œโ”€โ”€ Semantic search
โ”‚   โ””โ”€โ”€ Session management
โ””โ”€โ”€ ๐Ÿ’ฌ CLI Interface
    โ”œโ”€โ”€ Interactive commands
    โ”œโ”€โ”€ Progress indicators
    โ””โ”€โ”€ Error handling

๐Ÿ”ง Advanced Usage

Batch Processing

# Process multiple videos
videos = ['video1.mp4', 'video2.mp4', 'video3.mp4']
results = await coordinator.execute_batch(videos, 'batch_output')

Custom Memory Queries

# Search in stored transcriptions
memory = MemoryManager()
results = memory.search_transcriptions("machine learning", limit=10)

Quality Analysis

# Analyze video quality before transcription
analyzer = AnalyzerAgent()
metadata = await analyzer.execute({'video_path': 'video.mp4'})
print(f"Quality Score: {metadata['quality_score']}")

๐Ÿ› Troubleshooting

Common Issues

FFmpeg not found:

# Verify installation
ffmpeg -version

# Install if missing
brew install ffmpeg  # macOS
sudo apt install ffmpeg  # Ubuntu

Memory issues with large models:

  • Use smaller models: tiny, base, small
  • Process videos in smaller segments
  • Increase system RAM or use GPU acceleration

Transcription in wrong language:

  • Set language explicitly: language: "es" in config
  • Check audio quality and clarity
  • Try different Whisper models

Performance Optimization

  • GPU Acceleration: Install CUDA-compatible PyTorch for faster processing
  • Model Caching: Models are cached automatically after first use
  • Batch Processing: Process multiple videos in sequence for efficiency

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • OpenAI Whisper - Speech recognition models
  • ChromaDB - Vector database for semantic search
  • AutoGen - Multi-agent framework
  • Rich - Beautiful terminal interfaces
  • FFmpeg - Multimedia processing

๐Ÿ“ž Support


๐ŸŽฌ Transform your videos into accurate transcriptions with AI! ๐Ÿš€

Made with โค๏ธ by Lopand Solutions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_agent_video_transcription-1.1.1.tar.gz (60.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_agent_video_transcription-1.1.1-py3-none-any.whl (60.1 kB view details)

Uploaded Python 3

File details

Details for the file ai_agent_video_transcription-1.1.1.tar.gz.

File metadata

File hashes

Hashes for ai_agent_video_transcription-1.1.1.tar.gz
Algorithm Hash digest
SHA256 5fea3040d56e5316f102343c14438fb58c43902985c55e519deb9915dc530507
MD5 9d7f00796c3be75c8198efe37021bf21
BLAKE2b-256 ad3ce3bbc39a61b38d2dbca4ce376091682b09228d17f9a10176bca473f5f8d2

See more details on using hashes here.

File details

Details for the file ai_agent_video_transcription-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_agent_video_transcription-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 586e54d1217ff91c1620adfa2f5e9291cef82426d23cff61a4066a26ed113957
MD5 18a52a961d3bcb0af1241b7ea5f46625
BLAKE2b-256 1d2149922bd7cb8093ff3893f617706c3d47227bc6dd6e1a9be9504f24761904

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page