A tool that automatically generates step-by-step documentation from instructional videos

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

VideoInstruct

VideoInstruct is a tool that automatically generates step-by-step documentation from instructional videos. It uses AI to extract transcriptions, interpret video content, and create comprehensive markdown guides.

Features

Automatic video transcription extraction
AI-powered video interpretation
Step-by-step documentation generation
Automated documentation quality evaluation with conversation memory
Interactive Q&A workflow between AI agents
User feedback integration for documentation refinement
Configurable escalation to human users
Screenshot generation and annotation
PDF export capabilities
Enhanced workflow visibility with real-time status updates
Transparent model information display for each agent

Workflow Information

When running VideoInstruct, you'll see detailed information about:

Current AI models powering each agent:
- DocGenerator model and provider
- VideoInterpreter model (Google Gemini)
- DocEvaluator model and provider
Step-by-step workflow breakdown:
- Video transcription extraction
- Detailed video interpretation
- Documentation generation
- Documentation review and evaluation
- Quality assessment with feedback
- User interaction points
Progress tracking:
- Documentation versions
- Evaluation results
- Screenshot processing status
- PDF generation status

Project Structure

VideoInstruct/
├── data/                  # Place your video files here
├── examples/              # Example usage scripts
│   ├── example_usage.py   # Basic example with repository structure
├── output/                # Generated documentation output
├── videoinstruct/         # Main package
│   ├── agents/            # AI agent modules
│   │   ├── DocGenerator.py      # Documentation generation agent
│   │   ├── DocEvaluator.py      # Documentation evaluation agent
│   │   ├── VideoInterpreter.py  # Video interpretation agent
│   │   └── ScreenshotAgent.py   # Screenshot generation agent
│   ├── prompts/           # System prompts for agents
│   ├── tools/             # Utility tools
│   │   ├── image_annotator.py   # Image annotation tools
│   │   └── video_screenshot.py  # Video screenshot tools
│   ├── utils/             # Utility functions
│   │   ├── transcription.py     # Video transcription utilities
│   │   └── md2pdf.py            # Markdown to PDF conversion
│   ├── cli.py             # Command-line interface
│   ├── configs.py         # Configuration classes
│   ├── prompt_loader.py   # Prompt loading utilities
│   └── videoinstructor.py # Main orchestration class
├── .env                   # Environment variables (API keys)
├── MANIFEST.in            # Package manifest file
├── pyproject.toml         # Python project configuration
├── requirements.txt       # Package dependencies
├── setup.py               # Package setup file
└── README.md              # This file

Requirements

Python 3.8+
OpenAI API key (for DocGenerator)
Google Gemini API key (for VideoInterpreter)
DeepSeek API key (for DocEvaluator)
FFmpeg (for video processing)

Installation

From PyPI

pip install videoinstruct

From Source

Clone the repository:

git clone https://github.com/yourusername/VideoInstruct.git
cd VideoInstruct

Install dependencies:
```
pip install -r requirements.txt
```

Set up your environment variables in .env:

OPENAI_API_KEY=your_openai_api_key
GEMINI_API_KEY=your_gemini_api_key
DEEPSEEK_API_KEY=your_deepseek_api_key

Usage

Basic Usage

from videoinstruct import VideoInstructor, VideoInstructorConfig

# Initialize VideoInstructor with your video
instructor = VideoInstructor(video_path="path/to/your/video.mp4")

# Generate documentation
documentation_path = instructor.generate_documentation()

When you run the documentation generation, you'll see informative output like this:

==================================================
STARTING DOCUMENTATION GENERATION
==================================================
Generating documentation for video: your_video.mp4
----------------------------------------------------------------------------------------------------
Here are the current models empowering the agents:
DocGenerator:  openai gpt-4
VideoInterpreter:  google gemini-2.0-flash
DocEvaluator:  deepseek deepseek-reasoner
----------------------------------------------------------------------------------------------------

Workflow:
1. Video transcription will be extracted
2. VideoInterpreter will provide a detailed description
3. DocGenerator will create step-by-step documentation
4. Generated documentation will be shown to you before evaluation
5. DocEvaluator will assess documentation quality
   - Will provide feedback on each evaluation round
   - Will escalate to user after 3 rejections
6. You'll be asked for feedback at certain intervals
----------------------------------------------------------------------------------------------------

Using as a Python Package

You can use VideoInstruct as a Python package in your own projects:

from videoinstruct import VideoInstructor, VideoInstructorConfig
from videoinstruct.agents.DocGenerator import DocGeneratorConfig
from videoinstruct.agents.VideoInterpreter import VideoInterpreterConfig
from videoinstruct.agents.DocEvaluator import DocEvaluatorConfig
from pathlib import Path

# Configure the VideoInstructor
config = VideoInstructorConfig(
   # DocGenerator configuration
   doc_generator_config=DocGeneratorConfig(
      api_key=openai_api_key,
      model_provider="openai",
      model="o3-mini",
      temperature=0.7,
      max_output_tokens=4000
   ),

   # VideoInterpreter configuration
   video_interpreter_config=VideoInterpreterConfig(
      api_key=gemini_api_key,
      model="gemini-2.0-flash",  # You can change this to any supported Gemini model
      temperature=0.7
   ),

   # DocEvaluator configuration
   doc_evaluator_config=DocEvaluatorConfig(
      api_key=deepseek_api_key,
      model_provider="deepseek",
      model="deepseek-reasoner",
      temperature=0.2,
      max_rejection_count=3  # Number of rejections before escalating to user
   ),

   # VideoInstructor configuration
   max_iterations=15,
   output_dir="output",
   temp_dir="temp"
)

# Path to the video file - replace with your video file name
video_path = "test.mp4"  # Updated to match the actual file name

# Initialize VideoInstructor
instructor = VideoInstructor(
   video_path=video_path,
   config=config
)

# Generate documentation
documentation = instructor.generate_documentation()

Workflow

VideoInstruct follows this workflow:

Transcription: Extract text from the video
Initial Description: Get a detailed visual description from VideoInterpreter
Documentation Generation: DocGenerator creates initial documentation
User Preview: Generated documentation is shown to the user before evaluation
Documentation Evaluation: DocEvaluator assesses documentation quality
- Provides feedback on each evaluation round
- Maintains conversation memory for context-aware evaluation
- Escalates to human user after a configurable number of rejections
Refinement: Documentation is refined based on evaluator feedback
User Feedback: User provides final approval or additional feedback
Output: Final documentation is saved as markdown and optionally as PDF

Development

To contribute to VideoInstruct:

Fork the repository
Create a feature branch: git checkout -b feature-name
Commit your changes: git commit -am 'Add some feature'
Push to the branch: git push origin feature-name
Submit a pull request

License

MIT License

Configuration Options

VideoInstruct offers extensive configuration options for each component through its configuration classes. Here's a detailed breakdown:

Main Configuration (VideoInstructorConfig)

The main configuration class that orchestrates all components:

config = VideoInstructorConfig(
    max_iterations=10,          # Maximum refinement iterations
    output_dir="output",        # Output directory for documentation
    temp_dir="temp",           # Temporary file directory
    generate_pdf_for_all_versions=True  # Generate PDFs for all versions
)

DocGenerator Configuration

Controls how documentation is generated:

doc_generator_config = DocGeneratorConfig(
    model_provider="openai",    # AI provider (openai, anthropic, etc.)
    model="o3-mini",           # Model to use
    temperature=0.7,           # Creativity vs consistency (0-1)
    max_output_tokens=4000,    # Max response length
    stream=False,              # Stream responses
    response_format={"type": "json_object"}  # Response format
)

VideoInterpreter Configuration

Controls video analysis settings:

video_interpreter_config = VideoInterpreterConfig(
    model="gemini-2.0-flash",  # Gemini model for video analysis
    temperature=0.7,           # Analysis randomness
    max_output_tokens=None,    # Max response length
    top_k=None,               # Top-k sampling
    top_p=None                # Nucleus sampling
)

DocEvaluator Configuration

Controls documentation quality assessment:

doc_evaluator_config = DocEvaluatorConfig(
    model_provider="deepseek",  # AI provider
    model="deepseek-reasoner", # Model for evaluation
    temperature=0.2,           # Low temp for consistent evaluation
    max_rejection_count=3      # Max rejections before user escalation
)

Screenshot Agent Configuration

Controls screenshot generation and analysis:

screenshot_agent_config = ScreenshotAgentConfig(
    model="gemini-2.0-flash",  # Model for image analysis
    temperature=0.2,           # Low temp for consistent analysis
    max_output_tokens=None     # Max response length
)

Environment Variables

The following environment variables can be set in your .env file:

# Required API Keys
OPENAI_API_KEY=your_openai_api_key
GEMINI_API_KEY=your_gemini_api_key
DEEPSEEK_API_KEY=your_deepseek_api_key

# Optional Configuration
VIDEOINSTRUCT_OUTPUT_DIR=custom_output_dir
VIDEOINSTRUCT_TEMP_DIR=custom_temp_dir
VIDEOINSTRUCT_MAX_ITERATIONS=15

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.9

Mar 5, 2025

0.1.7

Mar 5, 2025

This version

0.1.6

Mar 4, 2025

0.1.5

Mar 4, 2025

0.1.4

Mar 4, 2025

0.1.3

Mar 4, 2025

0.1.2

Mar 2, 2025

0.1.1

Mar 2, 2025

0.1.0

Mar 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

videoinstruct-0.1.6.tar.gz (37.3 kB view details)

Uploaded Mar 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

videoinstruct-0.1.6-py3-none-any.whl (40.3 kB view details)

Uploaded Mar 4, 2025 Python 3

File details

Details for the file videoinstruct-0.1.6.tar.gz.

File metadata

Download URL: videoinstruct-0.1.6.tar.gz
Upload date: Mar 4, 2025
Size: 37.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for videoinstruct-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`392d18b261def6851a5704cbf86d965140a3fd582fdbd86d4a29f30234589083`
MD5	`c6c0937cf5bd47502e1a7043815b8ea9`
BLAKE2b-256	`03a6048a69e1380e07c50a0c32f5ce37c72f7967c31e9691ac21d70caece85db`

See more details on using hashes here.

File details

Details for the file videoinstruct-0.1.6-py3-none-any.whl.

File metadata

Download URL: videoinstruct-0.1.6-py3-none-any.whl
Upload date: Mar 4, 2025
Size: 40.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for videoinstruct-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9cfe580586f70e54e861655f6c776c726bd34c077d046b0a5f8264a33e61dd14`
MD5	`5a1c4abf19097c6f3f15d233603bb821`
BLAKE2b-256	`e5c7d66d8d88f975309998b128aa4a902ac9e338e5bb14dd5bce22ed6e2de127`

See more details on using hashes here.

videoinstruct 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VideoInstruct

Features

Workflow Information

Project Structure

Requirements

Installation

From PyPI

From Source

Usage

Basic Usage

Using as a Python Package

Workflow

Development

License

Configuration Options

Main Configuration (VideoInstructorConfig)

DocGenerator Configuration

VideoInterpreter Configuration

DocEvaluator Configuration

Screenshot Agent Configuration

Environment Variables

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes