A simple tool to make the video, audio, subtitle and video-url (especially youtube) content into a written markdown files with the ability to rewritten the oral expression into written ones, or translating the content into a target language by using LLM.

Project description

Wenbi: Intelligent Content Transformation

Wenbi is a versatile command-line interface (CLI) and web application designed to process various forms of media and text, transforming them into structured Markdown and CSV outputs. It leverages Large Language Models (LLMs) for advanced functionalities like transcription, translation, text rewriting, and academic rewriting.

Features

Multi-Input Support: Process video, audio, YouTube/web URLs, VTT, SRT, ASS, SSA, SUB, SMI, TXT, Markdown, DOCX, and PDF files.
Transcription: Convert spoken content from audio/video into text.
Translation: Translate transcribed or existing text into a target language.
Text Rewriting: Rewrite text, converting oral expressions to written form, with grammar correction and proofreading.
Academic Rewriting: Transform text into a formal academic style, preserving meaning and citations.
Batch Processing: Efficiently process multiple media files within a directory.
LLM Integration: Seamlessly integrate with various LLMs, including:
- Ollama (e.g., ollama/qwen3)
- Gemini (e.g., gemini/gemini-1.5-flash)
- OpenAI (e.g., openai/gpt-4o)
Configuration: Flexible configuration via command-line arguments or YAML files.
Gradio GUI: An intuitive web-based graphical user interface for easy interaction.
Multi-language Processing: Support for processing content in multiple languages.

Installation

Wenbi uses rye for dependency management. To install, ensure you have rye installed, then clone the repository and install dependencies:

git clone https://github.com/your-repo/wenbi.git # Replace with actual repo URL
cd wenbi
rye sync

Usage

CLI (Command Line Interface)

Wenbi provides a powerful CLI for various tasks. The main entry point is wenbi.

Main Command

Process a single input file (video, audio, URL, or text file) to generate Markdown and CSV outputs.

wenbi <input_file_or_url> [options]

# Example: Process a video file
wenbi my_video.mp4 --output-dir ./output --lang English

# Example: Process a YouTube URL
wenbi https://www.youtube.com/watch?v=dQw4w9WgXcQ --llm gemini/gemini-1.5-flash --lang Chinese

# Example: Process a VTT subtitle file
wenbi subtitles.vtt --output-dir ./output --lang English

# Example: Process a DOCX file for academic rewriting (requires --llm)
wenbi document.docx --llm ollama/qwen3 --lang English

# Example: Process a PDF file (requires --llm)
wenbi research_paper.pdf --llm ollama/qwen3 --lang English

Common Options:

-c, --config <path>: Path to a YAML configuration file.
-o, --output-dir <path>: Directory to save output files.
--llm <model_identifier>: Specify the LLM model to use (e.g., ollama/qwen3, gemini/gemini-1.5-flash, openai/gpt-4o).
-s, --transcribe-lang <language>: Language for transcription (e.g., Chinese, English).
-l, --lang <language>: Target language for translation/rewriting (default: Chinese).
-m, --multi-language: Enable multi-language processing.
-cl, --chunk-length <int>: Number of sentences per paragraph (default: 8).
-mt, --max-tokens <int>: Maximum tokens for LLM output (default: 130000).
-to, --timeout <int>: LLM request timeout in seconds (default: 3600).
-tm, --temperature <float>: LLM temperature parameter (default: 0.1).
-tsm, --transcribe-model <model_size>: Whisper model size for transcription (e.g., large-v3-turbo).
-ow, --output_wav <filename>: Filename for saving the segmented WAV (optional).
-st, --start_time <HH:MM:SS>: Start time for extraction from media.
-et, --end_time <HH:MM:SS>: End time for extraction from media.

Subcommands

Wenbi also provides specific subcommands for rewrite, translate, and academic tasks.

# Rewrite text
wenbi rewrite <input_file> --llm ollama/qwen3 --lang Chinese

# Translate text
wenbi translate <input_file> --llm gemini/gemini-1.5-flash --lang French

# Academic rewriting
wenbi academic <input_file> --llm openai/gpt-4o --lang English

Subcommands share common options with the main command.

Batch Processing

Process multiple media files in a directory using wenbi-batch.

wenbi-batch <input_directory> [options]

# Example: Process all media files in 'my_media_folder'
wenbi-batch my_media_folder --output-dir ./batch_output --translate-lang English

# Example: Process with a config file and combine markdown outputs
wenbi-batch my_media_folder -c config/batch-config.yml --md combined_output.md

Batch Options:

-c, --config <path>: Path to a YAML configuration file for batch processing.
--output-dir <path>: Output directory for batch results.
--rewrite-llm <model_id>: LLM for rewriting.
--translate-llm <model_id>: LLM for translation.
--transcribe-lang <language>: Language for transcription.
--translate-lang <language>: Target language for translation (default: Chinese).
--rewrite-lang <language>: Target language for rewriting (default: Chinese).
--multi-language: Enable multi-language processing.
--chunk-length <int>: Number of sentences per chunk.
--max-tokens <int>: Maximum tokens for LLM.
--timeout <int>: LLM timeout in seconds.
--temperature <float>: LLM temperature.
--md [path]: Output combined markdown file. If no path, uses input folder name.

Configuration Files (YAML)

Wenbi supports YAML configuration files for both single input and batch processing. This allows for more complex and reusable configurations.

Example single-input.yaml:

input: "path/to/your/video.mp4"
output_dir: "./my_output"
llm: "gemini/gemini-1.5-flash"
lang: "English"
chunk_length: 10

Example multiple-inputs.yaml (for wenbi main command):

inputs:
  - input: "path/to/video1.mp4"
    segments:
      - start_time: "00:00:10"
        end_time: "00:00:30"
        title: "Introduction"
      - start_time: "00:01:00"
        end_time: "00:01:30"
        title: "Key Points"
  - input: "path/to/audio.mp3"
    llm: "ollama/qwen3"
    lang: "Chinese"

Example batch-folder-config.yml (for wenbi-batch):

output_dir: "./batch_results"
translate_llm: "gemini/gemini-1.5-flash"
translate_lang: "French"
chunk_length: 12

Gradio GUI

Launch the web-based Gradio interface for an interactive experience:

wenbi --gui

Supported Input Types

Video: .mp4, .avi, .mov, .mkv, .flv, .wmv, .m4v, .webm
Audio: .mp3, .flac, .aac, .ogg, .m4a, .opus
URLs: YouTube and other web URLs.
Subtitle Files: .vtt, .srt, .ass, .ssa, .sub, .smi
Text Files: .txt, .md, .markdown
Document Files: .docx, .pdf

Output

Wenbi generates the following output files:

Markdown (.md): Contains the processed text (transcribed, translated, rewritten, or academic).
CSV (.csv): For transcribed content, provides a structured breakdown of segments and timestamps.
Comparison Markdown (_compare.md): For academic rewriting, a markdown file showing changes between original and academic text (requires redlines library).

LLM Integration

Wenbi uses dspy for LLM integration, allowing flexibility in choosing your preferred model. Ensure your environment variables are set for API keys if using commercial LLMs (e.g., OPENAI_API_KEY, GOOGLE_API_KEY).

To use Ollama models, ensure your Ollama server is running locally.

Contributing

Contributions are welcome! Please refer to the CONTRIBUTING.md (if available) for guidelines on how to contribute to this project. If not, please open an issue to discuss your proposed changes.

License

This project is licensed under the Apache-2.0 License. See the LICENSE file for details.))

Project details

Release history Release notifications | RSS feed

0.140.90

Feb 11, 2026

0.140.81

Jan 31, 2026

0.140.79

Aug 3, 2025

0.140.78

Jul 24, 2025

0.140.77

Jul 24, 2025

0.140.76

Jul 24, 2025

0.140.75

Jul 24, 2025

0.140.74

Jul 21, 2025

0.140.73

Jul 18, 2025

This version

0.140.72

Jul 18, 2025

0.140.71

Jul 9, 2025

0.140.69

May 29, 2025

0.140.68

May 18, 2025

0.140.67

May 5, 2025

0.140.66

Apr 23, 2025

0.140.65

Mar 2, 2025

0.140.64

Mar 2, 2025

0.140.63

Mar 1, 2025

0.140.62

Mar 1, 2025

0.140.61

Feb 28, 2025

0.14.6

Feb 27, 2025

0.14.5

Feb 27, 2025

0.14.4

Feb 27, 2025

0.14.3

Feb 26, 2025

0.14.2

Feb 26, 2025

0.14.1

Feb 26, 2025

0.14.0

Feb 26, 2025

0.13.0

Feb 26, 2025

0.12.0

Feb 26, 2025

0.11.0

Feb 26, 2025

0.10.1

Feb 26, 2025

0.1.0

Feb 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wenbi-0.140.72.tar.gz (42.9 kB view details)

Uploaded Jul 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wenbi-0.140.72-py3-none-any.whl (26.9 kB view details)

Uploaded Jul 18, 2025 Python 3

File details

Details for the file wenbi-0.140.72.tar.gz.

File metadata

Download URL: wenbi-0.140.72.tar.gz
Upload date: Jul 18, 2025
Size: 42.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.13.1

File hashes

Hashes for wenbi-0.140.72.tar.gz
Algorithm	Hash digest
SHA256	`4c3d77676bb895c17ecdb758d902e85606dc8c751ba04908e470142b3b544d6b`
MD5	`89a246ea3ffb97538cbfba4cacb0c11c`
BLAKE2b-256	`d3f17138d60cbae64c36e6a093d44a87e94093a3e26f6673d1a19d70f3a95d30`

See more details on using hashes here.

File details

Details for the file wenbi-0.140.72-py3-none-any.whl.

File metadata

Download URL: wenbi-0.140.72-py3-none-any.whl
Upload date: Jul 18, 2025
Size: 26.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.13.1

File hashes

Hashes for wenbi-0.140.72-py3-none-any.whl
Algorithm	Hash digest
SHA256	`475ac56c0d4da65d44ea1e4934fa07e1b78a7781b570368e5928b833ffa9226e`
MD5	`6f6a2760f6797baeffc9b9e7d4f589a7`
BLAKE2b-256	`21fc1f33825ec65ba5d3fafc1f3fee712c5be14eaddf9b396b0ff0cac1a4f3ea`

See more details on using hashes here.

wenbi 0.140.72

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Wenbi: Intelligent Content Transformation

Features

Installation

Usage

CLI (Command Line Interface)

Main Command

Subcommands

Batch Processing

Configuration Files (YAML)

Gradio GUI

Supported Input Types

Output

LLM Integration

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes