Multi-agent AI system for video transcription using OpenAI Whisper, ChromaDB, and AutoGen

These details have not been verified by PyPI

Project links

Project description

🎬 Video Transcription Agent

Multi-agent AI system for video transcription using OpenAI Whisper, ChromaDB, and AutoGen

Transform your videos into accurate transcriptions with an intelligent multi-agent system that combines the power of OpenAI Whisper, persistent memory with ChromaDB, and advanced agent coordination.

✨ Key Features

🤖 Multi-Agent Architecture

Intelligent Coordination: Specialized agents work together seamlessly
Video Analysis: Extract metadata, duration, and quality information
Audio Transcription: High-accuracy speech-to-text using OpenAI Whisper
Text Processing: Automatic punctuation, formatting, and error correction
Output Formatting: Multiple formats (SRT, VTT, TXT, JSON, CSV, DOCX)
Quality Assurance: Built-in validation and quality assessment

🧠 Persistent Memory with ChromaDB

Semantic Search: Find content using natural language queries
Session History: Keep track of all transcriptions across sessions
Metadata Storage: Rich information about videos and processing
Trend Analysis: Usage statistics and pattern recognition

💬 Interactive CLI Experience

Intuitive Commands: Simple, English-based command interface
Auto-completion: Tab completion for commands and file paths
Progress Indicators: Real-time progress bars and status updates
Rich Interface: Beautiful colors and formatting with Rich library
Error Handling: Graceful error management and recovery

🎯 Advanced Capabilities

Multiple Video Formats: MP4, AVI, MOV, MKV, FLV, WMV, WebM, M4V
Language Detection: Automatic or manual language selection
Precise Timestamps: Word-level timing accuracy
Batch Processing: Handle multiple videos simultaneously
Quality Optimization: Automatic recommendations for better results

🚀 Quick Start

Installation

# Install from PyPI
pip install ai-agent-video-transcription

# Or install from source
git clone https://github.com/lopand-solutions/video-transcription-agent.git
cd video-transcription-agent
pip install -e .

Prerequisites

Python 3.9+
FFmpeg (for audio/video processing)

# Install FFmpeg
# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg

# Windows (using Chocolatey)
choco install ffmpeg

Basic Usage

Interactive CLI (Recommended)

# Start the interactive CLI
vt-cli

# Or use the full command
video-transcription-agent

Available Commands:

🚀 Main Commands:

  • discovery      - Find videos in current directory
  • select <ID>   - Select a video from discovery results
  • transcribe    - Transcribe video file (always in Spanish)
  • analyze       - Analyze video metadata
  • search        - Search in saved transcriptions
  • history       - Show transcription history
  • status        - Show system status
  • help          - Show all commands
  • exit          - Exit the program

Command Line Usage

# Transcribe a video file
vt-cli transcribe path/to/video.mp4

# Analyze video metadata
vt-cli analyze path/to/video.mp4

# Search in transcriptions
vt-cli search "artificial intelligence"

# Show system status
vt-cli status

Programmatic Usage

import asyncio
from ai_agent_video_transcription import CoordinatorAgent

async def transcribe_video():
    # Configuration
    config = {
        'transcriber': {
            'model_size': 'base',  # tiny, base, small, medium, large
            'language': 'es'       # Force Spanish transcription
        },
        'formatter': {
            'output_directory': './output'
        },
        'output_formats': ['txt', 'srt']
    }
    
    # Create coordinator
    coordinator = CoordinatorAgent(config)
    
    # Execute transcription
    result = await coordinator.execute({
        'video_path': 'path/to/video.mp4',
        'output_name': 'my_transcription'
    })
    
    return result

# Run transcription
result = asyncio.run(transcribe_video())
print(f"Transcription completed: {result['final_outputs']}")

📖 Documentation

Configuration

Environment Variables

# Optional: Set OpenAI API key for advanced features
export OPENAI_API_KEY="your_api_key_here"

# Optional: Custom ChromaDB directory
export CHROMADB_PERSIST_DIRECTORY="./my_chroma_db"

Configuration File

Create a config.yaml file for advanced configuration:

transcriber:
  model_size: "base"  # tiny, base, small, medium, large, large-v2, large-v3
  language: "es"      # Force Spanish, or null for auto-detection
  temperature: 0.0

formatter:
  output_directory: "./output"
  supported_formats: ["txt", "srt", "vtt", "json"]

memory:
  persist_sessions: true
  session_timeout_minutes: 60

Supported Formats

Input Video Formats

MP4, AVI, MOV, MKV, FLV, WMV, WebM, M4V

Output Formats

TXT: Plain text transcription
SRT: SubRip subtitle format
VTT: WebVTT subtitle format
JSON: Structured data with timestamps
CSV: Comma-separated values
DOCX: Microsoft Word document

Whisper Models

Model	Size	Speed	Accuracy	Use Case
tiny	39M	Fastest	Basic	Quick testing
base	74M	Fast	Good	Recommended
small	244M	Medium	Better	Balanced
medium	769M	Slow	High	Quality focus
large	1550M	Slowest	Highest	Best quality

🏗️ Architecture

Video Transcription Agent
├── 🤖 Multi-Agent System
│   ├── Coordinator Agent (orchestrates workflow)
│   ├── Analyzer Agent (video metadata extraction)
│   ├── Transcriber Agent (Whisper-based transcription)
│   ├── Processor Agent (text enhancement)
│   └── Formatter Agent (output generation)
├── 🧠 Memory System (ChromaDB)
│   ├── Persistent storage
│   ├── Semantic search
│   └── Session management
└── 💬 CLI Interface
    ├── Interactive commands
    ├── Progress indicators
    └── Error handling

🔧 Advanced Usage

Batch Processing

# Process multiple videos
videos = ['video1.mp4', 'video2.mp4', 'video3.mp4']
results = await coordinator.execute_batch(videos, 'batch_output')

Custom Memory Queries

# Search in stored transcriptions
memory = MemoryManager()
results = memory.search_transcriptions("machine learning", limit=10)

Quality Analysis

# Analyze video quality before transcription
analyzer = AnalyzerAgent()
metadata = await analyzer.execute({'video_path': 'video.mp4'})
print(f"Quality Score: {metadata['quality_score']}")

🐛 Troubleshooting

Common Issues

FFmpeg not found:

# Verify installation
ffmpeg -version

# Install if missing
brew install ffmpeg  # macOS
sudo apt install ffmpeg  # Ubuntu

Memory issues with large models:

Use smaller models: tiny, base, small
Process videos in smaller segments
Increase system RAM or use GPU acceleration

Transcription in wrong language:

Set language explicitly: language: "es" in config
Check audio quality and clarity
Try different Whisper models

Performance Optimization

GPU Acceleration: Install CUDA-compatible PyTorch for faster processing
Model Caching: Models are cached automatically after first use
Batch Processing: Process multiple videos in sequence for efficiency

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

OpenAI Whisper - Speech recognition models
ChromaDB - Vector database for semantic search
AutoGen - Multi-agent framework
Rich - Beautiful terminal interfaces
FFmpeg - Multimedia processing

📞 Support

📧 Email: contact@lopand.com
🐛 Issues: GitHub Issues
📖 Documentation: Project Wiki

🎬 Transform your videos into accurate transcriptions with AI! 🚀

Made with ❤️ by Lopand Solutions

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.3

Sep 21, 2025

This version

1.1.2

Sep 16, 2025

1.1.1

Sep 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_agent_video_transcription-1.1.2.tar.gz (59.8 kB view details)

Uploaded Sep 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_agent_video_transcription-1.1.2-py3-none-any.whl (59.9 kB view details)

Uploaded Sep 16, 2025 Python 3

File details

Details for the file ai_agent_video_transcription-1.1.2.tar.gz.

File metadata

Download URL: ai_agent_video_transcription-1.1.2.tar.gz
Upload date: Sep 16, 2025
Size: 59.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for ai_agent_video_transcription-1.1.2.tar.gz
Algorithm	Hash digest
SHA256	`4da0845e31a9f0d977981fe61ad34bb8e469d703cfaffe30df71337103daded1`
MD5	`c7be76ecf88d16e6e47e3bac20b6ae98`
BLAKE2b-256	`e9323f2044f7ff2f9efa342422aa590557456a7cf34ea698e040b513e4a63daa`

See more details on using hashes here.

File details

Details for the file ai_agent_video_transcription-1.1.2-py3-none-any.whl.

File metadata

Download URL: ai_agent_video_transcription-1.1.2-py3-none-any.whl
Upload date: Sep 16, 2025
Size: 59.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for ai_agent_video_transcription-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`23205249ddeeecedd95af7bc3d851e3098cfd9e1a7b341dc59ab949619a88d29`
MD5	`c40aecbc8ebd7e924851dd056d811b19`
BLAKE2b-256	`1e9f6855d2cab4598e38837b6ae23fff7319514ae7c03840efec5af9592c30ec`

See more details on using hashes here.

ai-agent-video-transcription 1.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🎬 Video Transcription Agent

✨ Key Features

🤖 Multi-Agent Architecture

🧠 Persistent Memory with ChromaDB

💬 Interactive CLI Experience

🎯 Advanced Capabilities

🚀 Quick Start

Installation

Prerequisites

Basic Usage

Interactive CLI (Recommended)

Command Line Usage

Programmatic Usage

📖 Documentation

Configuration

Environment Variables

Configuration File

Supported Formats

Input Video Formats

Output Formats

Whisper Models

🏗️ Architecture

🔧 Advanced Usage

Batch Processing

Custom Memory Queries

Quality Analysis

🐛 Troubleshooting

Common Issues

Performance Optimization

🤝 Contributing

📝 License

🙏 Acknowledgments

📞 Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes