Generate SRT/LRC subtitles from audio/video files using multiple STT engines with optional LLM polishing

These details have not been verified by PyPI

Project links

Project description

TingShuo 听说

Generate SRT/LRC subtitles from audio/video files using multiple speech-to-text engines, with optional LLM polishing.

TingShuo recursively scans directories for media files, transcribes them using your choice of STT engine, and outputs subtitle files in SRT or LRC format. Optionally, subtitles can be polished using an LLM (Ollama or OpenAI-compatible API) or NLP sentence segmentation (nltk) to produce natural, complete sentences.

Features

4 STT Engines: faster-whisper, Vosk, OpenAI Whisper, whisper.cpp
2 Output Formats: SRT (SubRip) and LRC (lyrics)
Subtitle Translation: Translate subtitles to multiple target languages using NLLB or LLM
Multi-language UI: Interface supports English, Chinese, Japanese, Korean, French, German, Spanish, Italian, Portuguese, Russian
LLM Polishing: Merge fragmented subtitles into natural sentences via Ollama or OpenAI-compatible API
NLP Polishing: Sentence boundary detection via nltk (no LLM required)
CLI + GUI: Full command-line interface and tkinter graphical interface
Recursive Scanning: Process entire directory trees of media files
HuggingFace Mirror: Built-in support for HF mirror (useful in China mainland)
Flexible Output: Save subtitles alongside source files or to a custom directory
Settings Persistence: UI language and preferences saved to ~/.config/tingshuo/settings.json

Installation

From PyPI

# Base install (no STT engine included)
pip install tingshuo

# With a specific engine:
pip install tingshuo[faster-whisper]   # Recommended
pip install tingshuo[vosk]
pip install tingshuo[whisper]
pip install tingshuo[whisper-cpp]

# With NLP polishing:
pip install tingshuo[nlp]

# Everything:
pip install tingshuo[all]

From Source

git clone https://github.com/cycleuser/TingShuo.git
cd tingshuo
pip install -e .[faster-whisper,nlp]

Prerequisites

Python 3.9+
ffmpeg must be installed and available on your PATH
- Linux: sudo apt install ffmpeg
- macOS: brew install ffmpeg
- Windows: Download from ffmpeg.org and add to PATH

Quick Start

CLI

Basic transcription (SRT):

tingshuo -i ./videos -e faster-whisper -f srt

Generate LRC files to a specific output directory:

tingshuo -i ./audio -e vosk -f lrc -o ./subtitles

With LLM polishing (Ollama):

tingshuo -i ./media --polish-llm --ollama-model qwen2.5

With LLM polishing (OpenAI-compatible API):

tingshuo -i ./media --polish-llm --api-url https://api.example.com --api-key sk-xxx --api-model gpt-4o-mini

With NLP polishing:

tingshuo -i ./media --polish-nlp -l en

Specify language and model:

tingshuo -i ./videos -e faster-whisper -m large-v3 -l zh

Use HuggingFace mirror (China mainland):

tingshuo -i ./videos -e faster-whisper --hf-mirror https://hf-mirror.com

Translate subtitles to multiple languages (NLLB):

tingshuo -i ./videos -e faster-whisper --translate --target-lang zh,ja,ko

Translate subtitles using LLM:

tingshuo -i ./videos -e faster-whisper --translate --target-lang zh --trans-backend llm --ollama-model qwen2.5

Download a model before transcription:

tingshuo --download -e faster-whisper -m large-v3
tingshuo --download -e faster-whisper -m large-v3 --hf-mirror https://hf-mirror.com

Download all models for an engine:

tingshuo --download-all -e faster-whisper

List installed Ollama models:

tingshuo --list-ollama-models
tingshuo --list-ollama-models --ollama-url http://192.168.1.100:11434

GUI

tingshuo --gui

The GUI provides:

Directory selection with browse buttons
Engine and model selection dropdowns
Language dropdown with common languages (auto-detect, zh, en, ja, ko, etc.) or type custom codes
Model download buttons (Download / Download All) with progress feedback
Format toggle (SRT/LRC)
Polishing options (None / LLM / NLP) with configuration panels
Translation panel: Enable translation, select target languages, choose backend (NLLB or LLM)
Ollama model dropdown with Refresh button to query installed models from the server
Menu bar: Help > Settings (UI language), Help > About (version info)
Multi-language interface: Settings allow switching between 10 UI languages
HuggingFace mirror toggle
Progress bar and real-time log output
Start/Stop controls

CLI Reference

usage: tingshuo [-h] [--version] [--gui] [-i DIR] [-o DIR] [-f {srt,lrc}]
                [--no-recursive] [-e ENGINE] [-m NAME] [-l CODE]
                [--hf-mirror URL] [--download] [--download-all]
                [--list-ollama-models] [--polish-llm | --polish-nlp]
                [--ollama-url URL] [--ollama-model NAME] [--api-url URL]
                [--api-key KEY] [--api-model NAME] [-v]
                [--translate] [--target-lang CODES]
                [--trans-backend {nllb,llm}] [--nllb-model NAME]

Input/Output

Argument	Description
`-i`, `--input DIR`	Input directory containing audio/video files (required)
`-o`, `--output DIR`	Output directory for subtitles (default: same as source)
`-f`, `--format {srt,lrc}`	Subtitle format (default: srt)
`--no-recursive`	Do not scan subdirectories

STT Engine

Argument	Description
`-e`, `--engine`	Engine: `faster-whisper`, `vosk`, `whisper`, `whisper-cpp` (default: faster-whisper)
`-m`, `--model NAME`	Model name or path (default: engine-specific, usually "base")
`-l`, `--language CODE`	Language code: zh, en, ja, etc. Use "auto" for auto-detection (default: auto)

HuggingFace Mirror

Argument	Description
`--hf-mirror URL`	HuggingFace mirror URL, e.g. `https://hf-mirror.com`

Model Management

Argument	Description
`--download`	Download the model specified by `-e` and `-m`, then exit
`--download-all`	Download all known models for the engine specified by `-e`, then exit
`--list-ollama-models`	List installed Ollama models from the server (uses `--ollama-url`), then exit

Subtitle Polishing

Argument	Description
`--polish-llm`	Polish with LLM (Ollama or OpenAI-compatible API)
`--polish-nlp`	Polish with NLP sentence segmentation (nltk)
`--ollama-url URL`	Ollama API URL (default: http://localhost:11434)
`--ollama-model NAME`	Ollama model name (default: qwen2.5)
`--api-url URL`	OpenAI-compatible API base URL
`--api-key KEY`	API key for OpenAI-compatible service
`--api-model NAME`	Model name for API

Other

Argument	Description
`--gui`	Launch graphical interface
`-v`, `--verbose`	Enable debug logging
`--version`	Show version and exit

Translation

Argument	Description
`--translate`	Enable subtitle translation to target language(s)
`--target-lang CODES`	Comma-separated target language codes, e.g. `zh,en,ja`
`--trans-backend {nllb,llm}`	Translation backend: `nllb` (Helsinki-NLP/NLLB) or `llm` (default: nllb)
`--nllb-model NAME`	NLLB model name (default: facebook/nllb-200-distilled-600M)

Supported Formats

Input (Audio/Video)

Audio: mp3, wav, flac, aac, ogg, wma, m4a, opus

Video: mp4, mkv, avi, mov, wmv, flv, webm, ts, m4v, mpg, mpeg

Output

SRT (SubRip Text):

1
00:00:01,500 --> 00:00:04,200
This is the first subtitle line.

2
00:00:05,000 --> 00:00:08,300
This is the second subtitle line.

LRC (Lyrics):

[ti:filename]
[re:TingShuo v0.1.0]

[00:01.50]This is the first subtitle line.
[00:05.00]This is the second subtitle line.

STT Engines

faster-whisper (Recommended)

CTranslate2-based Whisper implementation. Fast, supports GPU acceleration.

pip install faster-whisper

Models: tiny, base, small, medium, large-v2, large-v3

Vosk

Lightweight offline speech recognition. Lower accuracy but very fast on CPU.

pip install vosk

Models: Downloaded automatically by language, or specify a local path with -m /path/to/model.

OpenAI Whisper

The original Whisper model from OpenAI.

pip install openai-whisper

Models: tiny, base, small, medium, large

whisper.cpp

C++ implementation of Whisper via Python bindings. Very fast on CPU.

pip install pywhispercpp

Models: tiny, base, small, medium, large

Subtitle Polishing

LLM Polishing

Sends subtitle segments to an LLM to merge fragments into complete, natural sentences.

With Ollama (local):

Install and start Ollama
Pull a model: ollama pull qwen2.5
Run: tingshuo -i ./media --polish-llm --ollama-model qwen2.5

With Ollama (LAN):

tingshuo -i ./media --polish-llm --ollama-url http://192.168.1.100:11434 --ollama-model qwen2.5

With OpenAI-compatible API:

tingshuo -i ./media --polish-llm --api-url https://api.openai.com --api-key sk-xxx --api-model gpt-4o-mini

NLP Polishing

Uses nltk sentence tokenization to detect sentence boundaries and merge fragments. No LLM or network access required.

pip install nltk
tingshuo -i ./media --polish-nlp -l en

Supports English, German, French, Spanish, Italian, Portuguese, and more via nltk. For Chinese/Japanese/Korean, uses punctuation-based sentence splitting.

Subtitle Translation

TingShuo can automatically translate generated subtitles to multiple target languages. Translated subtitles are saved as separate files with language codes (e.g., video.zh.srt, video.ja.srt).

NLLB Translation (Recommended)

Uses Helsinki-NLP/NLLB models for high-quality offline translation supporting 200+ languages.

# Install dependencies
pip install transformers sentencepiece

# Translate to Chinese and Japanese
tingshuo -i ./videos -e faster-whisper --translate --target-lang zh,ja

# Use a larger NLLB model for better quality
tingshuo -i ./videos --translate --target-lang zh --nllb-model facebook/nllb-200-distilled-1.3B

Available NLLB models: facebook/nllb-200-distilled-600M (default), facebook/nllb-200-distilled-1.3B, facebook/nllb-200-3.3B

LLM Translation

Uses Ollama or OpenAI-compatible API for translation.

# Translate using Ollama
tingshuo -i ./videos --translate --target-lang zh --trans-backend llm --ollama-model qwen2.5

# Translate using OpenAI API
tingshuo -i ./videos --translate --target-lang zh --trans-backend llm --api-url https://api.openai.com --api-key sk-xxx --api-model gpt-4o-mini

HuggingFace Mirror

For users in China mainland who have difficulty downloading models from HuggingFace:

tingshuo -i ./videos -e faster-whisper --hf-mirror https://hf-mirror.com

Or set the environment variable directly:

export HF_ENDPOINT=https://hf-mirror.com
tingshuo -i ./videos -e faster-whisper

License

This project is licensed under the GNU General Public License v3.0. See LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.6

Mar 9, 2026

0.1.5

Mar 9, 2026

0.1.4

Mar 7, 2026

0.1.2

Mar 7, 2026

This version

0.1.1

Mar 6, 2026

0.1.0

Mar 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tingshuo-0.1.1.tar.gz (45.8 kB view details)

Uploaded Mar 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tingshuo-0.1.1-py3-none-any.whl (45.8 kB view details)

Uploaded Mar 6, 2026 Python 3

File details

Details for the file tingshuo-0.1.1.tar.gz.

File metadata

Download URL: tingshuo-0.1.1.tar.gz
Upload date: Mar 6, 2026
Size: 45.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for tingshuo-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`4865b0e4bc41062c952196efc32c3944f2a45dde3c2c0a45d7cf91f53f104cd7`
MD5	`ca87ce642d277d267d4546b87de50b2a`
BLAKE2b-256	`87edbd4f035fab01b35d0a9c3298eaa77b5b59d6f72453761cd6dc33dca2b4f4`

See more details on using hashes here.

File details

Details for the file tingshuo-0.1.1-py3-none-any.whl.

File metadata

Download URL: tingshuo-0.1.1-py3-none-any.whl
Upload date: Mar 6, 2026
Size: 45.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for tingshuo-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`64580bd589c7bba9316eaeda954220ddf8962b367ea9d58eb9211ff400f68b10`
MD5	`63ddac496566f1f3be5a6ffc4ba9fb7d`
BLAKE2b-256	`ee6acede7c4d0aa7a931a5e1568ac83fa830a36d4072b0e8be670fc8a5c373ca`

See more details on using hashes here.

tingshuo 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TingShuo 听说

Features

Installation

From PyPI

From Source

Prerequisites

Quick Start

CLI

GUI

CLI Reference

Input/Output

STT Engine

HuggingFace Mirror

Model Management

Subtitle Polishing

Other

Translation

Supported Formats

Input (Audio/Video)

Output

STT Engines

faster-whisper (Recommended)

Vosk

OpenAI Whisper

whisper.cpp

Subtitle Polishing

LLM Polishing

NLP Polishing

Subtitle Translation

NLLB Translation (Recommended)

LLM Translation

HuggingFace Mirror

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes