Skip to main content

Multi-agent AI system for video transcription using OpenAI Whisper, ChromaDB, and AutoGen

Project description

๐ŸŽฌ Video Transcription Agent

Python Version License: MIT PyPI version Downloads

Multi-agent AI system for video transcription using OpenAI Whisper, ChromaDB, and AutoGen

Transform your videos into accurate transcriptions with an intelligent multi-agent system that combines the power of OpenAI Whisper, persistent memory with ChromaDB, and advanced agent coordination.

โœจ Key Features

๐Ÿค– Multi-Agent Architecture

  • Intelligent Coordination: Specialized agents work together seamlessly
  • Video Analysis: Extract metadata, duration, and quality information
  • Audio Transcription: High-accuracy speech-to-text using OpenAI Whisper
  • Text Processing: Automatic punctuation, formatting, and error correction
  • Output Formatting: Multiple formats (SRT, VTT, TXT, JSON, CSV, DOCX)
  • Quality Assurance: Built-in validation and quality assessment

๐Ÿง  Persistent Memory with ChromaDB

  • Semantic Search: Find content using natural language queries
  • Session History: Keep track of all transcriptions across sessions
  • Metadata Storage: Rich information about videos and processing
  • Trend Analysis: Usage statistics and pattern recognition

๐Ÿ’ฌ Interactive CLI Experience

  • Intuitive Commands: Simple, English-based command interface
  • Auto-completion: Tab completion for commands and file paths
  • Progress Indicators: Real-time progress bars and status updates
  • Rich Interface: Beautiful colors and formatting with Rich library
  • Error Handling: Graceful error management and recovery

๐ŸŽฏ Advanced Capabilities

  • Multiple Video Formats: MP4, AVI, MOV, MKV, FLV, WMV, WebM, M4V
  • Language Detection: Automatic or manual language selection
  • Precise Timestamps: Word-level timing accuracy
  • Batch Processing: Handle multiple videos simultaneously
  • Quality Optimization: Automatic recommendations for better results

๐Ÿš€ Quick Start

Installation

# Install from PyPI
pip install ai-agent-video-transcription

# Or install from source
git clone https://github.com/lopand-solutions/video-transcription-agent.git
cd video-transcription-agent
pip install -e .

Prerequisites

  • Python 3.9+
  • FFmpeg (for audio/video processing)
# Install FFmpeg
# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# Windows (using Chocolatey)
choco install ffmpeg

Basic Usage

Interactive CLI (Recommended)

# Start the interactive CLI
vt-cli

# Or use the full command
video-transcription-agent

Available Commands:

๐Ÿš€ Main Commands:

  โ€ข discovery      - Find videos in current directory
  โ€ข select <ID>   - Select a video from discovery results
  โ€ข transcribe    - Transcribe video file (always in Spanish)
  โ€ข analyze       - Analyze video metadata
  โ€ข search        - Search in saved transcriptions
  โ€ข history       - Show transcription history
  โ€ข status        - Show system status
  โ€ข help          - Show all commands
  โ€ข exit          - Exit the program

Command Line Usage

# Transcribe a video file
vt-cli transcribe path/to/video.mp4

# Analyze video metadata
vt-cli analyze path/to/video.mp4

# Search in transcriptions
vt-cli search "artificial intelligence"

# Show system status
vt-cli status

Programmatic Usage

import asyncio
from ai_agent_video_transcription import CoordinatorAgent

async def transcribe_video():
    # Configuration
    config = {
        'transcriber': {
            'model_size': 'base',  # tiny, base, small, medium, large
            'language': 'es'       # Force Spanish transcription
        },
        'formatter': {
            'output_directory': './output'
        },
        'output_formats': ['txt', 'srt']
    }
    
    # Create coordinator
    coordinator = CoordinatorAgent(config)
    
    # Execute transcription
    result = await coordinator.execute({
        'video_path': 'path/to/video.mp4',
        'output_name': 'my_transcription'
    })
    
    return result

# Run transcription
result = asyncio.run(transcribe_video())
print(f"Transcription completed: {result['final_outputs']}")

๐Ÿ“– Documentation

Configuration

Environment Variables

# Optional: Set OpenAI API key for advanced features
export OPENAI_API_KEY="your_api_key_here"

# Optional: Custom ChromaDB directory
export CHROMADB_PERSIST_DIRECTORY="./my_chroma_db"

Configuration File

Create a config.yaml file for advanced configuration:

transcriber:
  model_size: "base"  # tiny, base, small, medium, large, large-v2, large-v3
  language: "es"      # Force Spanish, or null for auto-detection
  temperature: 0.0

formatter:
  output_directory: "./output"
  supported_formats: ["txt", "srt", "vtt", "json"]

memory:
  persist_sessions: true
  session_timeout_minutes: 60

Supported Formats

Input Video Formats

  • MP4, AVI, MOV, MKV, FLV, WMV, WebM, M4V

Output Formats

  • TXT: Plain text transcription
  • SRT: SubRip subtitle format
  • VTT: WebVTT subtitle format
  • JSON: Structured data with timestamps
  • CSV: Comma-separated values
  • DOCX: Microsoft Word document

Whisper Models

Model Size Speed Accuracy Use Case
tiny 39M Fastest Basic Quick testing
base 74M Fast Good Recommended
small 244M Medium Better Balanced
medium 769M Slow High Quality focus
large 1550M Slowest Highest Best quality

๐Ÿ—๏ธ Architecture

Video Transcription Agent
โ”œโ”€โ”€ ๐Ÿค– Multi-Agent System
โ”‚   โ”œโ”€โ”€ Coordinator Agent (orchestrates workflow)
โ”‚   โ”œโ”€โ”€ Analyzer Agent (video metadata extraction)
โ”‚   โ”œโ”€โ”€ Transcriber Agent (Whisper-based transcription)
โ”‚   โ”œโ”€โ”€ Processor Agent (text enhancement)
โ”‚   โ””โ”€โ”€ Formatter Agent (output generation)
โ”œโ”€โ”€ ๐Ÿง  Memory System (ChromaDB)
โ”‚   โ”œโ”€โ”€ Persistent storage
โ”‚   โ”œโ”€โ”€ Semantic search
โ”‚   โ””โ”€โ”€ Session management
โ””โ”€โ”€ ๐Ÿ’ฌ CLI Interface
    โ”œโ”€โ”€ Interactive commands
    โ”œโ”€โ”€ Progress indicators
    โ””โ”€โ”€ Error handling

๐Ÿ”ง Advanced Usage

Batch Processing

# Process multiple videos
videos = ['video1.mp4', 'video2.mp4', 'video3.mp4']
results = await coordinator.execute_batch(videos, 'batch_output')

Custom Memory Queries

# Search in stored transcriptions
memory = MemoryManager()
results = memory.search_transcriptions("machine learning", limit=10)

Quality Analysis

# Analyze video quality before transcription
analyzer = AnalyzerAgent()
metadata = await analyzer.execute({'video_path': 'video.mp4'})
print(f"Quality Score: {metadata['quality_score']}")

๐Ÿ› Troubleshooting

Common Issues

FFmpeg not found:

# Verify installation
ffmpeg -version

# Install if missing
brew install ffmpeg  # macOS
sudo apt install ffmpeg  # Ubuntu

Memory issues with large models:

  • Use smaller models: tiny, base, small
  • Process videos in smaller segments
  • Increase system RAM or use GPU acceleration

Transcription in wrong language:

  • Set language explicitly: language: "es" in config
  • Check audio quality and clarity
  • Try different Whisper models

Performance Optimization

  • GPU Acceleration: Install CUDA-compatible PyTorch for faster processing
  • Model Caching: Models are cached automatically after first use
  • Batch Processing: Process multiple videos in sequence for efficiency

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • OpenAI Whisper - Speech recognition models
  • ChromaDB - Vector database for semantic search
  • AutoGen - Multi-agent framework
  • Rich - Beautiful terminal interfaces
  • FFmpeg - Multimedia processing

๐Ÿ“ž Support


๐ŸŽฌ Transform your videos into accurate transcriptions with AI! ๐Ÿš€

Made with โค๏ธ by Lopand Solutions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_agent_video_transcription-1.1.2.tar.gz (59.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_agent_video_transcription-1.1.2-py3-none-any.whl (59.9 kB view details)

Uploaded Python 3

File details

Details for the file ai_agent_video_transcription-1.1.2.tar.gz.

File metadata

File hashes

Hashes for ai_agent_video_transcription-1.1.2.tar.gz
Algorithm Hash digest
SHA256 4da0845e31a9f0d977981fe61ad34bb8e469d703cfaffe30df71337103daded1
MD5 c7be76ecf88d16e6e47e3bac20b6ae98
BLAKE2b-256 e9323f2044f7ff2f9efa342422aa590557456a7cf34ea698e040b513e4a63daa

See more details on using hashes here.

File details

Details for the file ai_agent_video_transcription-1.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_agent_video_transcription-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 23205249ddeeecedd95af7bc3d851e3098cfd9e1a7b341dc59ab949619a88d29
MD5 c40aecbc8ebd7e924851dd056d811b19
BLAKE2b-256 1e9f6855d2cab4598e38837b6ae23fff7319514ae7c03840efec5af9592c30ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page