Transcription and summarization tool for YouTube, Twitter/X.com videos, and local files with automatic translation and language model processing
Project description
TRNS — Video Transcription & Summarization
Transcribe YouTube, Twitter/X, and local video files. Automatic translation to Russian, LLM summaries via OpenRouter. Works as a CLI tool or Telegram bot.
Tech Stack
| Component | Technology |
|---|---|
| Speech-to-text | faster-whisper |
| Video download | yt-dlp (YouTube, Twitter/X, 1000+ sites) |
| Subtitles | youtube-transcript-api |
| Translation | deep-translator (Google Translate) |
| LLM processing | OpenRouter.ai via OpenAI client |
| Telegram bot | Pyrogram (MTProto, up to 2 GB file downloads) |
| Webhook server | FastAPI + Uvicorn |
Quick Start
Install
pip install trns
# Also need FFmpeg:
# macOS: brew install ffmpeg
# Linux: sudo apt install ffmpeg
CLI
trns https://www.youtube.com/watch?v=VIDEO_ID
trns https://twitter.com/user/status/1234567890
trns /path/to/video.mp4
# With options
trns https://youtu.be/abc --whisper-model medium --debug
Telegram Bot
export BOT_TOKEN=... # from @BotFather
export TELEGRAM_API_ID=... # from my.telegram.org
export TELEGRAM_API_HASH=... # from my.telegram.org
export AUTH_KEY=secret123 # users authenticate with this once
python -m trns.bot.server
Then set up a webhook pointing to https://your-domain/webhook. See Setup Guide.
Configuration
TRNS uses a JSON config file (config.json). On first run, a default is created automatically. You can also pass everything via CLI flags — explicit CLI flags always win over config values, which in turn override parser defaults. If you don't pass a flag, the config value is used; if the config doesn't set it either, the built-in default applies.
Configuration Reference
Every key in config.json maps to a CLI flag. Here's what each one does:
| Key | CLI Flag | Type | Default | Description |
|---|---|---|---|---|
url |
positional arg | string | "" |
Video URL or local file path. Empty = must provide on command line. |
method |
--method |
"auto" | "subtitles" | "whisper" |
"auto" |
auto: try YouTube captions first, fall back to Whisper. subtitles: captions only (fails if unavailable). whisper: always use speech-to-text. |
interval |
--interval |
integer (seconds) | 30 |
Chunk duration for live/chunked processing. Each chunk is this many seconds of audio. |
language |
--language |
string (ISO 639-1) | "en" |
Expected language of the video. Used for subtitle extraction and as a hint for Whisper. |
whisper_model |
--whisper-model |
"auto" | "tiny" | "base" | "small" | "medium" | "large" |
"auto" |
Whisper model size. auto (default) picks tiny for English, small for other languages. Explicit values override auto-selection. Larger = more accurate but slower and uses more RAM. |
use_faster_whisper |
--use-faster-whisper / --no-faster-whisper |
boolean | true |
Use the faster-whisper library (CTranslate2 backend). Use --no-faster-whisper to fall back to openai-whisper. |
translation_output |
--translation-output |
"russian-only" | "both" | "original-only" |
"russian-only" |
What to print for transcription output. russian-only: only the Russian translation. both: original + Russian. original-only: no translation. |
save_transcript |
--save-transcript |
string (file path) | null |
null |
If set, appends all output to this file. Relative paths resolve against TRNS_HOME / CWD. |
overlap |
--overlap |
integer (seconds) | 2 |
Overlap between audio chunks. Prevents words from being cut at chunk boundaries. |
process_mode |
--process-mode |
"auto" | "chunked" | "full" |
"auto" |
auto: full for regular videos, chunked for live streams. full: download entire video, transcribe with progress bar. chunked: process in interval-second pieces (required for live). |
lm_window_seconds |
--lm-window-seconds |
integer (seconds) | 120 |
How much transcription context the LLM sees. It gets the last ceil(window_seconds / interval) chunks. |
lm_interval |
--lm-interval |
integer (seconds) | 30 |
How often the LLM processes accumulated text. Can differ from interval. |
lm_output_mode |
--lm-output-mode |
"both" | "transcriptions-only" | "lm-only" |
"both" |
both: print transcriptions AND LLM summaries. transcriptions-only: skip LLM entirely. lm-only: only show LLM output. |
lm_api_key_file |
--lm-api-key-file |
string (file path) | "api_key.txt" |
File containing OpenRouter API key. |
lm_prompt_file |
--lm-prompt-file |
string (file path) | "prompt.md" |
Prompt template for Russian-language LLM processing. |
lm_model |
--lm-model |
string | "google/gemma-3-27b-it:free" |
OpenRouter model identifier. See openrouter.ai/models for options. Free models have :free suffix. |
debug |
--debug |
boolean | false |
false (production): logs go to logs.txt, stdout shows only transcription/LLM output. true: verbose logs go to stderr, useful for troubleshooting. |
context |
--context |
string | "" |
Additional context passed to the LLM (e.g. "This is a Sberbank earnings call"). Helps the model produce better summaries. |
allowed_user_ids |
— | array of integers | [] |
Telegram user IDs allowed to use the bot. Users can also authenticate at runtime via AUTH_KEY. |
Example config.json
{
"url": "",
"method": "auto",
"interval": 30,
"language": "en",
"whisper_model": "medium",
"use_faster_whisper": true,
"translation_output": "russian-only",
"save_transcript": null,
"overlap": 2,
"process_mode": "full",
"lm_window_seconds": 120,
"lm_interval": 30,
"lm_output_mode": "both",
"lm_api_key_file": "api_key.txt",
"lm_prompt_file": "prompt.md",
"lm_model": "google/gemma-3-27b-it:free",
"debug": false,
"context": "",
"allowed_user_ids": []
}
Environment Variables
These are primarily for the Telegram bot server, not the CLI:
| Variable | Purpose | Fallback file |
|---|---|---|
BOT_TOKEN |
Telegram bot token | bot_key.txt |
TELEGRAM_API_ID |
Pyrogram MTProto API ID (from my.telegram.org) | — |
TELEGRAM_API_HASH |
Pyrogram MTProto API hash | — |
AUTH_KEY |
One-time auth key for new bot users | key.txt |
OPENROUTER_API_KEY |
OpenRouter API key (alternative to api_key.txt) |
api_key.txt |
HOST |
Server bind address | 0.0.0.0 |
PORT |
Server port | 8000 |
CONFIG_PATH |
Path to config.json |
config.json |
METADATA_PATH |
Path to metadata.json |
metadata.json |
TRNS_HOME |
Base directory for resolving all relative paths | CWD |
File Layout
When you run TRNS, it expects these files relative to TRNS_HOME (defaults to your current working directory):
your-project/
├── config.json # Main configuration (auto-created on first run)
├── metadata.json # Localization strings + daily capacity counter
├── api_key.txt # OpenRouter API key(s), one per line
├── prompt.md # LLM prompt template (Russian output)
├── prompt_original.md # LLM prompt template (original language output)
├── bot_key.txt # Telegram bot token (alternative to env var)
├── key.txt # Auth key (alternative to env var)
└── logs.txt # Production logs (auto-created)
How It Works
Pipeline Flow
Video URL or file
│
├─ 1. Try YouTube auto-captions (if method=auto and it's YouTube)
│ └─ Success? Skip Whisper, go to step 3
│
├─ 2. Download audio → Whisper speech-to-text
│ ├─ Language auto-detection
│ ├─ Chunk overlap to prevent word loss
│ └─ Progress bar (full mode) or streaming (chunked mode)
│
├─ 3. Translate to Russian (if source ≠ Russian)
│ └─ Google Translate via deep-translator
│
└─ 4. LLM summarization (if lm_output_mode ≠ transcriptions-only)
├─ Sends last N seconds of transcription (lm_window_seconds)
├─ Bilingual mode: separate prompts for original + Russian
└─ Output: structured summary per prompt template
Processing Modes
| Mode | When | How |
|---|---|---|
| full | Regular videos | Downloads entire video first, transcribes with progress bar. Best quality. |
| chunked | Live streams | Processes audio in interval-second chunks. Real-time output. |
| auto | Default | Picks full for regular videos, chunked for live streams. |
Whisper Models
| Model | Size | Speed | Quality | RAM |
|---|---|---|---|---|
tiny |
39M | ~32x realtime | Basic | ~1 GB |
base |
74M | ~16x realtime | OK | ~1 GB |
small |
244M | ~6x realtime | Good | ~2 GB |
medium |
769M | ~2x realtime | Very good | ~5 GB |
large |
1550M | ~1x realtime | Best | ~10 GB |
API Key Setup
Put your OpenRouter API key in api_key.txt. The system tracks daily usage capacity in metadata.json and resets it at UTC midnight.
Architecture
┌─────────────────────────────────────────────────────┐
│ User Interface │
├────────────────────┬────────────────────────────────┤
│ CLI (trns cmd) │ Telegram Bot (Pyrogram+FastAPI)│
└─────────┬──────────┴──────────────┬─────────────────┘
└────────────┬────────────┘
│
┌─────────────▼──────────────┐
│ TranscriptionPipeline │
│ (orchestration + threads)│
└─────────────┬──────────────┘
│
┌─────────────────┼─────────────────┐
│ │ │
┌────▼─────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ yt-dlp │ │ faster- │ │ OpenRouter │
│ audio │ │ whisper │ │ LLM │
│ download │ │ STT │ │ summaries │
└──────────┘ └─────────────┘ └─────────────┘
Threading (Telegram Bot)
The bot uses a queue-based architecture for thread-safe output:
- Webhook arrives → FastAPI handler → spawns background thread
- Background thread runs
TranscriptionPipelinewith anoutput_callback output_callbackputs text into aqueue.Queue- Async loop drains the queue and sends messages to Telegram
No global state mutation — each pipeline instance is independent.
Development
git clone https://github.com/kakoyvostorg/trns.git
cd trns
pip install -e ".[dev]"
pytest # 117 tests
ruff format . # code formatting
Docker
docker build -f docker/Dockerfile -t trns .
docker-compose -f docker/docker-compose.yml up
Further Reading
- Setup Guide — installation, FFmpeg, webhook config
- Deployment Guide — production deployment (Yandex Cloud, VMs, Docker)
- Architecture — detailed system internals
- Руководство пользователя — Telegram bot user guide (Russian)
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trns-0.2.1.tar.gz.
File metadata
- Download URL: trns-0.2.1.tar.gz
- Upload date:
- Size: 74.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5b8e085cf8d4984f1cc32590733931b21000187e6624ea3efa7b1a1a352fbbd
|
|
| MD5 |
a5dafef4ecc59bf03bed6ff65be76a3b
|
|
| BLAKE2b-256 |
e0eb891958013225abd2c8c87a70e6ab28b3d7ea2cd73d45b3a023ab93b58d1b
|
File details
Details for the file trns-0.2.1-py3-none-any.whl.
File metadata
- Download URL: trns-0.2.1-py3-none-any.whl
- Upload date:
- Size: 65.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
295535a00b26a2db4954c31aa096a0dbbff5215cd68ee6f0ce91dde367b1907e
|
|
| MD5 |
67c6cd3272d246372e14ec866721da5d
|
|
| BLAKE2b-256 |
7866b59843d47f8987414096866b4cc104f9601ad0c884871d38f0b1871f7570
|