A simple tool to make the video, audio, subtitle and video-url (especially youtube) content into a written markdown files with the ability to rewritten the oral expression into written ones, or translating the content into a target language by using LLM.

Project description

Wenbi Logo

🎬 Wenbi: Intelligent Media-to-Text and Text-to-Text Processing

Transform your audio and video content into polished, academic-quality written documents with AI precision!

Wenbi is a revolutionary CLI tool and web application that focuses on media-to-text and text-to-text processing. Whether you're a researcher, student, content creator, or professional, Wenbi transforms your raw audio/video content and existing text documents into beautifully formatted, academically rigorous documents.

✨ Why Wenbi?

🎯 From Speech to Scholarship: Convert lectures, interviews, podcasts, and presentations into publication-ready academic texts

🌍 Universal Language Bridge: Seamlessly translate and adapt content across languages while maintaining academic integrity

📝 Intelligent Rewriting: Transform casual speech patterns into formal, written expression with perfect grammar and flow

⏱️ Time-Stamped Precision: Maintain full traceability with timestamp citations linking back to original audio/video sources

🧠 LLM-Powered Excellence: Harness the power of multiple AI models (OpenAI GPT, Google Gemini, Ollama) for superior results

🚀 Core Features

📹 Multimedia Processing Powerhouse

Universal Input Support: Seamlessly handle videos (MP4, AVI, MOV, MKV), audio files (MP3, FLAC, AAC), YouTube URLs, and subtitle files (VTT, SRT, ASS)
Advanced Transcription: Powered by OpenAI Whisper with configurable model sizes (large-v3-turbo recommended)
Time-Stamped Output: NEW! --cite-timestamps feature maintains precise traceability with markdown headers showing exact time ranges

🧠 AI-Powered Text Transformation

Intelligent Rewriting: Transform casual spoken language into polished written prose
Academic Excellence: Elevate content to publication-quality academic standards with proper citations and formal structure
Smart Translation: Contextually accurate translations that preserve meaning and academic integrity
Multi-LLM Support: Choose from OpenAI GPT-4, Google Gemini, or local Ollama models

🔧 Professional Workflow Tools

Batch Processing: Process entire directories of media files with wenbi-batch
Flexible Configuration: YAML-based configurations for complex, repeatable workflows
Document Processing: Handle DOCX documents and various text formats
Web Interface: Beautiful Gradio GUI for non-technical users
Multi-language Intelligence: Automatic language detection and cross-lingual processing

💼 Real-World Use Cases

🎓 Academic Research

# Transform lecture recordings into formatted academic notes with timestamps
wenbi lecture_recording.mp4 --llm gemini/gemini-2.0-flash --cite-timestamps --output-dir ./course_notes

# Convert research interview to academic paper format
wenbi interview.mp3 academic --llm openai/gpt-4o --lang English

📚 Content Creation

# Turn podcast episodes into blog posts
wenbi podcast_episode.mp3 rewrite --llm ollama/qwen3 --lang English --chunk-length 6

# Process YouTube educational content for documentation
wenbi "https://youtube.com/watch?v=example" --llm gemini/gemini-1.5-flash --cite-timestamps

🌐 International Collaboration

# Translate conference presentations with academic precision
wenbi conference_talk.mp4 translate --llm gemini/gemini-2.0-flash --lang French --cite-timestamps

# Process multilingual research materials
wenbi research_video.mp4 --multi-language --translate-lang English --rewrite-lang Chinese

⚡ Quick Start

Prerequisites

Python 3.10+
For commercial LLMs: API keys (OPENAI_API_KEY, GOOGLE_API_KEY)
For local LLMs: Ollama installation

Installation

Wenbi can be installed using multiple package managers:

📦 Install with pip (recommended)

# Install from PyPI
pip install wenbi

# Quick test - process a subtitle file with timestamps
wenbi your_subtitle.vtt --cite-timestamps --llm gemini/gemini-1.5-flash

⚡ Install with uv (fastest)

# Install with uv for fastest installation
uv pip install wenbi

# Quick test
wenbi your_content.mp4 --cite-timestamps --llm gemini/gemini-1.5-flash

🔧 Development installation with Rye

# Clone the repository for development
git clone https://github.com/areopagusworkshop/wenbi.git
cd wenbi

# Install dependencies with Rye
rye sync

# Activate the virtual environment
rye shell

# Quick test - process a subtitle file with timestamps
wenbi your_subtitle.vtt --cite-timestamps --llm gemini/gemini-1.5-flash

🎯 NEW: Timestamp Citation Feature

The --cite-timestamps option transforms your output with precise time-stamped sections:

Input: Regular VTT/SRT subtitle file
Output: Markdown with timestamp headers

### **00:00:00 - 00:00:23**

This introductory section discusses the fundamental concepts of the topic, establishing the theoretical framework that will guide our understanding throughout the presentation.

### **00:00:23 - 00:00:45**

The speaker then transitions to examining the practical applications, demonstrating how these theoretical principles manifest in real-world scenarios.

Perfect for: Academic note-taking, research documentation, content verification, and creating citeable references to audio/video sources!

Usage

CLI (Command Line Interface)

Wenbi provides a powerful CLI for various tasks. The main entry point is wenbi.

Main Command

Process a single input file (video, audio, URL, or text file) to generate Markdown and CSV outputs.

wenbi <input_file_or_url> [options]

# Example: Process a video file
wenbi my_video.mp4 --output-dir ./output --lang English

# Example: Process a YouTube URL
wenbi https://www.youtube.com/watch?v=dQw4w9WgXcQ --llm gemini/gemini-1.5-flash --lang Chinese

# Example: Process a VTT subtitle file
wenbi subtitles.vtt --output-dir ./output --lang English

# Example: Process a DOCX file for academic rewriting (requires --llm)
wenbi document.docx --llm ollama/qwen3 --lang English

Common Options:

-c, --config <path>: Path to a YAML configuration file.
-o, --output-dir <path>: Directory to save output files.
--llm <model_identifier>: Specify the LLM model to use (e.g., ollama/qwen3, gemini/gemini-1.5-flash, openai/gpt-4o).
--cite-timestamps: NEW! Include precise timestamp headers in output markdown (format: ### **HH:MM:SS - HH:MM:SS**)
-s, --transcribe-lang <language>: Language for transcription (e.g., Chinese, English).
-l, --lang <language>: Target language for translation/rewriting (default: Chinese).
-m, --multi-language: Enable multi-language processing.
-cl, --chunk-length <int>: Number of sentences per paragraph (default: 8).
-mt, --max-tokens <int>: Maximum tokens for LLM output (default: 130000).
-to, --timeout <int>: LLM request timeout in seconds (default: 3600).
-tm, --temperature <float>: LLM temperature parameter (default: 0.1).
-tsm, --transcribe-model <model_size>: Whisper model size for transcription (e.g., large-v3-turbo).
-ow, --output_wav <filename>: Filename for saving the segmented WAV (optional).
-st, --start_time <HH:MM:SS>: Start time for extraction from media.
-et, --end_time <HH:MM:SS>: End time for extraction from media.

Subcommands

Wenbi provides specific subcommands for different processing tasks:

# Rewrite text (oral → written)
wenbi rewrite <input_file> --llm ollama/qwen3 --lang Chinese

# Translate text to target language
wenbi translate <input_file> --llm gemini/gemini-1.5-flash --lang French

# Academic rewriting for scholarly style
wenbi academic <input_file> --llm openai/gpt-4o --lang English

# NEW: Combine speech with presentation slides
wenbi ppt <speech_input> <slides_file> --llm ollama/qwen3 --lang English
# (abbreviated: wenbi p <speech_input> <slides_file>)

PPT Subcommand: The new ppt subcommand intelligently combines speech with presentation slides:

Accepts any speech format: video, audio, URL, or markdown file
Accepts any slides format: PDF, PPTX, or markdown file
Skips redundant processing: Uses markdown files directly if provided (no re-transcription/conversion)
Transcribes and rewrites media files using full rewrite subcommand
Converts PDF/PPTX slides to markdown using marker-pdf
Uses LLM-based alignment to find where each slide appears in the speech
Inserts slides before matching speech sections for seamless integration
Perfect for lectures, conferences, and educational content

Examples:

# Merge lecture recording with presentation slides
wenbi ppt lecture.mp4 presentation.pdf \
  --llm gemini/gemini-1.5-flash \
  --lang English \
  --cite-timestamps \
  --output-dir ./lecture_notes

# Use existing markdown files (no reprocessing)
wenbi ppt speech.md slides.md \
  --llm ollama/qwen3 \
  --output-dir ./output

# Mix media and markdown (transcribe video, use slides markdown)
wenbi ppt lecture.mp4 slides.md \
  --lang English \
  --output-dir ./notes

Subcommands share common options with the main command.

🎥 Video Slides Extraction (NEW!)

The PPT subcommand now supports extracting slides directly from video recordings:

# Extract slides from video with automatic detection
wenbi ppt lecture_video.mp4 --video-slides --cite-timestamps

# Extract with custom time range
wenbi ppt lecture_video.mp4 --video-slides \
  --slides-start-time 00:15:00 \
  --slides-end-time 00:45:00 \
  --cite-timestamps

# Manual ROI override for slide area
wenbi ppt lecture_video.mp4 --video-slides --manual-roi --cite-timestamps

Video Slides Features:

Automatic Slide Detection: AI-powered region of interest (ROI) detection
Scene Change Detection: Identifies slide transitions using PySceneDetect
OCR Processing: Extracts text content from slides using marker-pdf
Timestamp Integration: Precise timing with HH:MM:SS format
Combined Output: Embeds slide images with transcribed speech content

For detailed PPT subcommand documentation, see VIDEO_SLIDES_USAGE.md.

Batch Processing

Process multiple media files in a directory using wenbi-batch.

wenbi-batch <input_directory> [options]

# Example: Process all media files in 'my_media_folder'
wenbi-batch my_media_folder --output-dir ./batch_output --translate-lang English

# Example: Process with a config file and combine markdown outputs
wenbi-batch my_media_folder -c config/batch-config.yml --md combined_output.md

Batch Options:

-c, --config <path>: Path to a YAML configuration file for batch processing.
--output-dir <path>: Output directory for batch results.
--rewrite-llm <model_id>: LLM for rewriting.
--translate-llm <model_id>: LLM for translation.
--transcribe-lang <language>: Language for transcription.
--translate-lang <language>: Target language for translation (default: Chinese).
--rewrite-lang <language>: Target language for rewriting (default: Chinese).
--multi-language: Enable multi-language processing.
--chunk-length <int>: Number of sentences per chunk.
--max-tokens <int>: Maximum tokens for LLM.
--timeout <int>: LLM timeout in seconds.
--temperature <float>: LLM temperature.
--md [path]: Output combined markdown file. If no path, uses input folder name.

Configuration Files (YAML)

Wenbi supports YAML configuration files for both single input and batch processing. This allows for more complex and reusable configurations.

Example single-input.yaml:

input: "path/to/your/video.mp4"
output_dir: "./my_output"
llm: "gemini/gemini-1.5-flash"
lang: "English"
chunk_length: 10

Example multiple-inputs.yaml (for wenbi main command):

inputs:
  - input: "path/to/video1.mp4"
    segments:
      - start_time: "00:00:10"
        end_time: "00:00:30"
        title: "Introduction"
      - start_time: "00:01:00"
        end_time: "00:01:30"
        title: "Key Points"
  - input: "path/to/audio.mp3"
    llm: "ollama/qwen3"
    lang: "Chinese"

Example batch-folder-config.yml (for wenbi-batch):

output_dir: "./batch_results"
translate_llm: "gemini/gemini-1.5-flash"
translate_lang: "French"
chunk_length: 12

Gradio GUI

Launch the web-based Gradio interface for an interactive experience:

wenbi --gui

🐍 Programmatic Usage (Python API)

Wenbi can be used as a Python library for integration into your own applications:

from wenbi.main import process_input
from wenbi.model import rewrite, translate, academic
from wenbi.utils import transcribe, parse_subtitle

# Process a video file with timestamp citations
result = process_input(
    file_path="lecture.mp4",
    llm="gemini/gemini-1.5-flash",
    subcommand="academic",
    lang="English",
    cite_timestamps=True,
    output_dir="./output"
)

# Direct text processing
academic_text = academic(
    "input.vtt",
    output_dir="./output",
    llm="openai/gpt-4o",
    academic_lang="English",
    cite_timestamps=True
)

# Transcribe audio/video to VTT
vtt_file, csv_file = transcribe(
    "audio.mp3",
    language="English",
    output_dir="./output",
    model_size="large-v3-turbo"
)

# Translate existing text
translated = translate(
    "document.txt",
    output_dir="./output",
    translate_language="French",
    llm="gemini/gemini-2.0-flash",
    cite_timestamps=False
)

Key Functions:

process_input(): Main processing pipeline
transcribe(): Audio/video to text transcription
rewrite(): Oral to written text transformation
translate(): Language translation
academic(): Academic style transformation
parse_subtitle(): Process existing subtitle files

Supported Input Types

Wenbi focuses on media-to-text and text-to-text processing:

Video: .mp4, .avi, .mov, .mkv, .flv, .wmv, .m4v, .webm
Audio: .mp3, .flac, .aac, .ogg, .m4a, .opus
URLs: YouTube and other web URLs.
Subtitle Files: .vtt, .srt, .ass, .ssa, .sub, .smi
Text Files: .txt, .md, .markdown
Document Files: .docx

Output

Wenbi generates the following output files:

Markdown (.md): Contains the processed text (transcribed, translated, rewritten, or academic).
CSV (.csv): For transcribed content, provides a structured breakdown of segments and timestamps.
Comparison Markdown (_compare.md): For academic rewriting, a markdown file showing changes between original and academic text (requires redlines library).

LLM Integration

Wenbi uses dspy for LLM integration, allowing flexibility in choosing your preferred model. Ensure your environment variables are set for API keys if using commercial LLMs (e.g., OPENAI_API_KEY, GOOGLE_API_KEY).

To use Ollama models, ensure your Ollama server is running locally.

👥 Community & Contributing

Join the Wenbi Community! We're building the future of audio/video to academic text transformation.

🚀 Ways to Contribute

📝 Submit Issues: Found a bug or have a feature request? Open an issue
🔧 Code Contributions: Improve transcription accuracy, add new LLM integrations, or enhance the timestamp citation system
🌍 Translations: Help us support more languages for global accessibility
📚 Documentation: Improve guides, add examples, or create tutorials
⭐ Share: Star the project and share with researchers, educators, and content creators

💬 Get Help & Connect

GitHub Issues: Technical support and bug reports
Discussions: Share use cases, tips, and feature ideas
Documentation: Check our examples and configuration guides

🎯 Recent Updates (v0.140.81)

✨ NEW: Video Slides Extraction: Extract slides directly from lecture recordings with automatic detection
🔧 Enhanced PPT Integration: Improved slide alignment and speech combination algorithms
⚡ Performance Optimizations: Faster processing for large media files
🐛 Bug Fixes: Resolved timestamp formatting and transcription accuracy issues

🎯 Roadmap & Future Features

Real-time processing for live streams
Enhanced speaker identification and diarization
Academic citation format exports (APA, MLA, Chicago)
Integration with reference managers (Zotero, Mendeley)
REST API server for enterprise deployments
Advanced academic writing enhancement features
Multi-modal content analysis with video understanding
Collaborative editing and annotation features

📜 License

This project is licensed under the Apache-2.0 License - see the license.md file for details.

✨ Ready to transform your audio/video content into academic excellence?

Get started today:

git clone https://github.com/areopagusworkshop/wenbi.git
cd wenbi && rye sync && rye shell
wenbi your_content.mp4 --cite-timestamps --llm gemini/gemini-1.5-flash

🌟 Star this project if you find it useful and help us build the future of academic content creation!

Project details

Release history Release notifications | RSS feed

0.140.90

Feb 11, 2026

This version

0.140.81

Jan 31, 2026

0.140.79

Aug 3, 2025

0.140.78

Jul 24, 2025

0.140.77

Jul 24, 2025

0.140.76

Jul 24, 2025

0.140.75

Jul 24, 2025

0.140.74

Jul 21, 2025

0.140.73

Jul 18, 2025

0.140.72

Jul 18, 2025

0.140.71

Jul 9, 2025

0.140.69

May 29, 2025

0.140.68

May 18, 2025

0.140.67

May 5, 2025

0.140.66

Apr 23, 2025

0.140.65

Mar 2, 2025

0.140.64

Mar 2, 2025

0.140.63

Mar 1, 2025

0.140.62

Mar 1, 2025

0.140.61

Feb 28, 2025

0.14.6

Feb 27, 2025

0.14.5

Feb 27, 2025

0.14.4

Feb 27, 2025

0.14.3

Feb 26, 2025

0.14.2

Feb 26, 2025

0.14.1

Feb 26, 2025

0.14.0

Feb 26, 2025

0.13.0

Feb 26, 2025

0.12.0

Feb 26, 2025

0.11.0

Feb 26, 2025

0.10.1

Feb 26, 2025

0.1.0

Feb 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wenbi-0.140.81.tar.gz (2.7 MB view details)

Uploaded Jan 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wenbi-0.140.81-py3-none-any.whl (76.9 kB view details)

Uploaded Jan 31, 2026 Python 3

File details

Details for the file wenbi-0.140.81.tar.gz.

File metadata

Download URL: wenbi-0.140.81.tar.gz
Upload date: Jan 31, 2026
Size: 2.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.13.1

File hashes

Hashes for wenbi-0.140.81.tar.gz
Algorithm	Hash digest
SHA256	`f12603643493f5dd659ca4cfec503b7aa1711737f7c3c09f29b0170888d6ec31`
MD5	`3b222dfe8c73d991bb2b46032b96d660`
BLAKE2b-256	`0623dec34a250be2db4155681554f857e57cca6fd270b7dcd7865a8bfe32d158`

See more details on using hashes here.

File details

Details for the file wenbi-0.140.81-py3-none-any.whl.

File metadata

Download URL: wenbi-0.140.81-py3-none-any.whl
Upload date: Jan 31, 2026
Size: 76.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.13.1

File hashes

Hashes for wenbi-0.140.81-py3-none-any.whl
Algorithm	Hash digest
SHA256	`51122d70d68c8b39c43b1e343696d73a9650f5ccc7ee559674906d3c8770ee01`
MD5	`5d7b9d099b06ca45002309c13e58fc7b`
BLAKE2b-256	`36a020e4b28ceeeca27cf9c07ee0d0a2beff289ec4e3c35b1d1927155a4edddf`

See more details on using hashes here.

wenbi 0.140.81

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

🎬 Wenbi: Intelligent Media-to-Text and Text-to-Text Processing

✨ Why Wenbi?

🚀 Core Features

📹 Multimedia Processing Powerhouse

🧠 AI-Powered Text Transformation

🔧 Professional Workflow Tools

💼 Real-World Use Cases

🎓 Academic Research

📚 Content Creation

🌐 International Collaboration

⚡ Quick Start

Prerequisites

Installation

📦 Install with pip (recommended)

⚡ Install with uv (fastest)

🔧 Development installation with Rye

🎯 NEW: Timestamp Citation Feature

Usage

CLI (Command Line Interface)

Main Command

Subcommands

🎥 Video Slides Extraction (NEW!)

Batch Processing

Configuration Files (YAML)

Gradio GUI

🐍 Programmatic Usage (Python API)

Supported Input Types

Output

LLM Integration

👥 Community & Contributing

🚀 Ways to Contribute

💬 Get Help & Connect

🎯 Recent Updates (v0.140.81)

🎯 Roadmap & Future Features

📜 License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes