A tool that automatically generates step-by-step documentation from instructional videos
Project description
VideoInstruct
VideoInstruct is a tool that automatically generates step-by-step documentation from instructional videos. It uses AI to extract transcriptions, interpret video content, and create comprehensive markdown guides.
Features
- Automatic video transcription extraction
- AI-powered video interpretation
- Step-by-step documentation generation
- Automated documentation quality evaluation with conversation memory
- Interactive Q&A workflow between AI agents
- User feedback integration for documentation refinement
- Configurable escalation to human users
- Screenshot generation and annotation
- PDF export capabilities
Project Structure
VideoInstruct/
├── data/ # Place your video files here
├── examples/ # Example usage scripts
│ ├── example_usage.py # Basic example with repository structure
├── output/ # Generated documentation output
├── videoinstruct/ # Main package
│ ├── agents/ # AI agent modules
│ │ ├── DocGenerator.py # Documentation generation agent
│ │ ├── DocEvaluator.py # Documentation evaluation agent
│ │ ├── VideoInterpreter.py # Video interpretation agent
│ │ └── ScreenshotAgent.py # Screenshot generation agent
│ ├── prompts/ # System prompts for agents
│ ├── tools/ # Utility tools
│ │ ├── image_annotator.py # Image annotation tools
│ │ └── video_screenshot.py # Video screenshot tools
│ ├── utils/ # Utility functions
│ │ ├── transcription.py # Video transcription utilities
│ │ └── md2pdf.py # Markdown to PDF conversion
│ ├── cli.py # Command-line interface
│ ├── configs.py # Configuration classes
│ ├── prompt_loader.py # Prompt loading utilities
│ └── videoinstructor.py # Main orchestration class
├── .env # Environment variables (API keys)
├── MANIFEST.in # Package manifest file
├── pyproject.toml # Python project configuration
├── requirements.txt # Package dependencies
├── setup.py # Package setup file
└── README.md # This file
Requirements
- Python 3.8+
- OpenAI API key (for DocGenerator)
- Google Gemini API key (for VideoInterpreter)
- DeepSeek API key (for DocEvaluator)
- FFmpeg (for video processing)
Installation
From PyPI
pip install videoinstruct
From Source
-
Clone the repository:
git clone https://github.com/PouriaRouzrokh/VideoInstruct.git cd VideoInstruct
-
Install the package in development mode:
pip install -e .
-
Create a
.envfile in the root directory with your API keys:OPENAI_API_KEY=your_openai_api_key GEMINI_API_KEY=your_gemini_api_key DEEPSEEK_API_KEY=your_deepseek_api_key
Examples
The repository includes two example scripts to help you get started:
-
example_usage.py: Demonstrates direct usage with the repository structure and hardcoded paths. This is useful if you're working directly with the repository without installing it as a package.
-
package_usage.py: Shows how to use VideoInstruct after it's been installed as a package. This example demonstrates:
- Using VideoInstruct as an imported Python package in your code
- Using VideoInstruct from the command line
To run the examples:
# Run the basic example
python examples/example_usage.py
# Run the package usage example
python examples/package_usage.py
Using as a Python Package
You can use VideoInstruct as a Python package in your own projects:
from videoinstruct import VideoInstructor, VideoInstructorConfig
from videoinstruct.agents.DocGenerator import DocGeneratorConfig
from videoinstruct.agents.VideoInterpreter import VideoInterpreterConfig
from videoinstruct.agents.DocEvaluator import DocEvaluatorConfig
from pathlib import Path
# Create configuration
config = VideoInstructorConfig(
doc_generator_config=DocGeneratorConfig(
model="gpt-4o-mini",
temperature=0.7,
max_output_tokens=4000
),
video_interpreter_config=VideoInterpreterConfig(
model="gemini-2.0-flash",
temperature=0.7
),
doc_evaluator_config=DocEvaluatorConfig(
model="deepseek/deepseek-reasoner",
temperature=0.2,
max_rejection_count=3
),
user_feedback_interval=3,
max_iterations=15,
output_dir="output",
temp_dir="temp"
)
# Initialize VideoInstructor
instructor = VideoInstructor(config)
# Process a video
video_path = Path("path/to/your/video.mp4")
output_path = instructor.process_video(video_path)
print(f"Documentation generated successfully: {output_path}")
Using the Command Line Interface
VideoInstruct comes with a command-line interface:
# Basic usage
videoinstruct path/to/your/video.mp4
# With custom options
videoinstruct path/to/your/video.mp4 \
--output-dir custom_output \
--temp-dir custom_temp \
--max-iterations 10 \
--user-feedback-interval 2 \
--doc-generator-model "gpt-4o" \
--video-interpreter-model "gemini-2.0-pro" \
--doc-evaluator-model "deepseek/deepseek-reasoner"
Workflow
VideoInstruct follows this workflow:
- Transcription: Extract text from the video
- Initial Description: Get a detailed visual description from VideoInterpreter
- Documentation Generation: DocGenerator creates initial documentation
- User Preview: Generated documentation is shown to the user before evaluation
- Documentation Evaluation: DocEvaluator assesses documentation quality
- Provides feedback on each evaluation round
- Maintains conversation memory for context-aware evaluation
- Escalates to human user after a configurable number of rejections
- Refinement: Documentation is refined based on evaluator feedback
- User Feedback: User provides final approval or additional feedback
- Output: Final documentation is saved as markdown and optionally as PDF
Development
To contribute to VideoInstruct:
- Fork the repository
- Create a feature branch:
git checkout -b feature-name - Commit your changes:
git commit -am 'Add some feature' - Push to the branch:
git push origin feature-name - Submit a pull request
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file videoinstruct-0.1.2.tar.gz.
File metadata
- Download URL: videoinstruct-0.1.2.tar.gz
- Upload date:
- Size: 35.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3abb875e4cdad85de7fba4ae03618980765914e31c3758431e5f2e8082717cd6
|
|
| MD5 |
71d7fee734f924b04a27089c39aae385
|
|
| BLAKE2b-256 |
2893c797d1ad1165cf60e73899025995ba7d842ccd957abb0d49bbbec57ac6f3
|
File details
Details for the file videoinstruct-0.1.2-py3-none-any.whl.
File metadata
- Download URL: videoinstruct-0.1.2-py3-none-any.whl
- Upload date:
- Size: 40.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64d9405201d7d5eca183df5985c6c54eff76a8912232611ac6d1513bcc96a198
|
|
| MD5 |
7e0f714a79a4d398150e3ca8f9cbf13c
|
|
| BLAKE2b-256 |
9dd4f701abb895a5b7ca7a09944194d5f78abfe5d75c2ea0055e3dada24818bf
|