Skip to main content

A tool that automatically generates step-by-step documentation from instructional videos

Project description

VideoInstruct

VideoInstruct is a tool that automatically generates step-by-step documentation from instructional videos. It uses AI to extract transcriptions, interpret video content, and create comprehensive markdown guides.

Features

  • Automatic video transcription extraction
  • AI-powered video interpretation
  • Step-by-step documentation generation
  • Automated documentation quality evaluation with conversation memory
  • Interactive Q&A workflow between AI agents
  • User feedback integration for documentation refinement
  • Configurable escalation to human users
  • Screenshot generation and annotation
  • PDF export capabilities
  • Enhanced workflow visibility with real-time status updates
  • Transparent model information display for each agent

Workflow Information

When running VideoInstruct, you'll see detailed information about:

  1. Current AI models powering each agent:

    • DocGenerator model and provider
    • VideoInterpreter model (Google Gemini)
    • DocEvaluator model and provider
  2. Step-by-step workflow breakdown:

    • Video transcription extraction
    • Detailed video interpretation
    • Documentation generation
    • Documentation review and evaluation
    • Quality assessment with feedback
    • User interaction points
  3. Progress tracking:

    • Documentation versions
    • Evaluation results
    • Screenshot processing status
    • PDF generation status

Project Structure

VideoInstruct/
├── data/                  # Place your video files here
├── examples/              # Example usage scripts
│   ├── example_usage.py   # Basic example with repository structure
├── output/                # Generated documentation output
├── videoinstruct/         # Main package
│   ├── agents/            # AI agent modules
│   │   ├── DocGenerator.py      # Documentation generation agent
│   │   ├── DocEvaluator.py      # Documentation evaluation agent
│   │   ├── VideoInterpreter.py  # Video interpretation agent
│   │   └── ScreenshotAgent.py   # Screenshot generation agent
│   ├── prompts/           # System prompts for agents
│   ├── tools/             # Utility tools
│   │   ├── image_annotator.py   # Image annotation tools
│   │   └── video_screenshot.py  # Video screenshot tools
│   ├── utils/             # Utility functions
│   │   ├── transcription.py     # Video transcription utilities
│   │   └── md2pdf.py            # Markdown to PDF conversion
│   ├── cli.py             # Command-line interface
│   ├── configs.py         # Configuration classes
│   ├── prompt_loader.py   # Prompt loading utilities
│   └── videoinstructor.py # Main orchestration class
├── .env                   # Environment variables (API keys)
├── MANIFEST.in            # Package manifest file
├── pyproject.toml         # Python project configuration
├── requirements.txt       # Package dependencies
├── setup.py               # Package setup file
└── README.md              # This file

Requirements

  • Python 3.8+
  • OpenAI API key (for DocGenerator)
  • Google Gemini API key (for VideoInterpreter)
  • DeepSeek API key (for DocEvaluator)
  • FFmpeg (for video processing)

Installation

From PyPI

pip install videoinstruct

From Source

  1. Clone the repository:

    git clone https://github.com/yourusername/VideoInstruct.git
    cd VideoInstruct
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Set up your environment variables in .env:

    OPENAI_API_KEY=your_openai_api_key
    GEMINI_API_KEY=your_gemini_api_key
    DEEPSEEK_API_KEY=your_deepseek_api_key
    

Usage

Basic Usage

from videoinstruct import VideoInstructor, VideoInstructorConfig

# Initialize VideoInstructor with your video
instructor = VideoInstructor(video_path="path/to/your/video.mp4")

# Generate documentation
documentation_path = instructor.generate_documentation()

When you run the documentation generation, you'll see informative output like this:

==================================================
STARTING DOCUMENTATION GENERATION
==================================================
Generating documentation for video: your_video.mp4
----------------------------------------------------------------------------------------------------
Here are the current models empowering the agents:
DocGenerator:  openai gpt-4
VideoInterpreter:  google gemini-2.0-flash
DocEvaluator:  deepseek deepseek-reasoner
----------------------------------------------------------------------------------------------------

Workflow:
1. Video transcription will be extracted
2. VideoInterpreter will provide a detailed description
3. DocGenerator will create step-by-step documentation
4. Generated documentation will be shown to you before evaluation
5. DocEvaluator will assess documentation quality
   - Will provide feedback on each evaluation round
   - Will escalate to user after 3 rejections
6. You'll be asked for feedback at certain intervals
----------------------------------------------------------------------------------------------------

Using as a Python Package

You can use VideoInstruct as a Python package in your own projects:

from videoinstruct import VideoInstructor, VideoInstructorConfig
from videoinstruct.agents.DocGenerator import DocGeneratorConfig
from videoinstruct.agents.VideoInterpreter import VideoInterpreterConfig
from videoinstruct.agents.DocEvaluator import DocEvaluatorConfig
from pathlib import Path

# Configure the VideoInstructor
config = VideoInstructorConfig(
   # DocGenerator configuration
   doc_generator_config=DocGeneratorConfig(
      api_key=openai_api_key,
      model_provider="openai",
      model="o3-mini",
      temperature=0.7,
      max_output_tokens=4000
   ),

   # VideoInterpreter configuration
   video_interpreter_config=VideoInterpreterConfig(
      api_key=gemini_api_key,
      model="gemini-2.0-flash",  # You can change this to any supported Gemini model
      temperature=0.7
   ),

   # DocEvaluator configuration
   doc_evaluator_config=DocEvaluatorConfig(
      api_key=deepseek_api_key,
      model_provider="deepseek",
      model="deepseek-reasoner",
      temperature=0.2,
      max_rejection_count=3  # Number of rejections before escalating to user
   ),

   # VideoInstructor configuration
   user_feedback_interval=3,  # Get user feedback every 3 iterations
   max_iterations=15,
   output_dir="output",
   temp_dir="temp"
)

# Path to the video file - replace with your video file name
video_path = "test.mp4"  # Updated to match the actual file name

# Initialize VideoInstructor
instructor = VideoInstructor(
   video_path=video_path,
   config=config
)

# Generate documentation
documentation = instructor.generate_documentation()

Workflow

VideoInstruct follows this workflow:

  1. Transcription: Extract text from the video
  2. Initial Description: Get a detailed visual description from VideoInterpreter
  3. Documentation Generation: DocGenerator creates initial documentation
  4. User Preview: Generated documentation is shown to the user before evaluation
  5. Documentation Evaluation: DocEvaluator assesses documentation quality
    • Provides feedback on each evaluation round
    • Maintains conversation memory for context-aware evaluation
    • Escalates to human user after a configurable number of rejections
  6. Refinement: Documentation is refined based on evaluator feedback
  7. User Feedback: User provides final approval or additional feedback
  8. Output: Final documentation is saved as markdown and optionally as PDF

Development

To contribute to VideoInstruct:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature-name
  3. Commit your changes: git commit -am 'Add some feature'
  4. Push to the branch: git push origin feature-name
  5. Submit a pull request

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

videoinstruct-0.1.5.tar.gz (35.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

videoinstruct-0.1.5-py3-none-any.whl (38.7 kB view details)

Uploaded Python 3

File details

Details for the file videoinstruct-0.1.5.tar.gz.

File metadata

  • Download URL: videoinstruct-0.1.5.tar.gz
  • Upload date:
  • Size: 35.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for videoinstruct-0.1.5.tar.gz
Algorithm Hash digest
SHA256 c65818d09c3d5e0a645b58f249deaa1c6f504281961e10e1099d28459356e971
MD5 c0b3d255f66b694a95219b22de125ee6
BLAKE2b-256 281cf7e9dfe67798270cf122417f2a145b2e4f2af79ebabbfc3ef2cf783d96d4

See more details on using hashes here.

File details

Details for the file videoinstruct-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: videoinstruct-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 38.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for videoinstruct-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e292c94947c3ceb61958fafffcc800742a5f9f76aeae98b807124f59f9a8fd81
MD5 9db632010193632ab408f2db7b4a009b
BLAKE2b-256 50d0c4e67450d7eaf6548ed7dfb43692261ca5d2e0320de7a9abf7ec92b75120

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page