Skip to main content

Transcription and summarization tool for YouTube, Twitter/X.com videos, and local files with automatic translation and language model processing

Project description

TRNS — Video Transcription & Summarization

Transcribe YouTube, Twitter/X, and local video files. Automatic translation to Russian, LLM summaries via OpenRouter. Works as a CLI tool or Telegram bot.

Tech Stack

Component Technology
Speech-to-text faster-whisper
Video download yt-dlp (YouTube, Twitter/X, 1000+ sites)
Subtitles youtube-transcript-api
Translation deep-translator (Google Translate)
LLM processing OpenRouter.ai via OpenAI client
Telegram bot Pyrogram (MTProto, up to 2 GB file downloads)
Webhook server FastAPI + Uvicorn

Quick Start

Install

pip install trns
# Also need FFmpeg:
# macOS: brew install ffmpeg
# Linux: sudo apt install ffmpeg

CLI

trns https://www.youtube.com/watch?v=VIDEO_ID
trns https://twitter.com/user/status/1234567890
trns /path/to/video.mp4

# With options
trns https://youtu.be/abc --whisper-model medium --debug

Telegram Bot

export BOT_TOKEN=...            # from @BotFather
export TELEGRAM_API_ID=...      # from my.telegram.org
export TELEGRAM_API_HASH=...    # from my.telegram.org
export AUTH_KEY=secret123       # users authenticate with this once

python -m trns.bot.server

Then set up a webhook pointing to https://your-domain/webhook. See Setup Guide.


Configuration

TRNS uses a JSON config file (config.json). On first run, a default is created automatically. You can also pass everything via CLI flags — explicit CLI flags always win over config values, which in turn override parser defaults. If you don't pass a flag, the config value is used; if the config doesn't set it either, the built-in default applies.

Configuration Reference

Every key in config.json maps to a CLI flag. Here's what each one does:

Key CLI Flag Type Default Description
url positional arg string "" Video URL or local file path. Empty = must provide on command line.
method --method "auto" | "subtitles" | "whisper" "auto" auto: try YouTube captions first, fall back to Whisper. subtitles: captions only (fails if unavailable). whisper: always use speech-to-text.
interval --interval integer (seconds) 30 Chunk duration for live/chunked processing. Each chunk is this many seconds of audio.
language --language string (ISO 639-1) "en" Expected language of the video. Used for subtitle extraction and as a hint for Whisper.
whisper_model --whisper-model "auto" | "tiny" | "base" | "small" | "medium" | "large" "auto" Whisper model size. auto (default) picks tiny for English, small for other languages. Explicit values override auto-selection. Larger = more accurate but slower and uses more RAM.
use_faster_whisper --use-faster-whisper / --no-faster-whisper boolean true Use the faster-whisper library (CTranslate2 backend). Use --no-faster-whisper to fall back to openai-whisper.
translation_output --translation-output "russian-only" | "both" | "original-only" "russian-only" What to print for transcription output. russian-only: only the Russian translation. both: original + Russian. original-only: no translation.
save_transcript --save-transcript string (file path) | null null If set, appends all output to this file. Relative paths resolve against TRNS_HOME / CWD.
overlap --overlap integer (seconds) 2 Overlap between audio chunks. Prevents words from being cut at chunk boundaries.
process_mode --process-mode "auto" | "chunked" | "full" "auto" auto: full for regular videos, chunked for live streams. full: download entire video, transcribe with progress bar. chunked: process in interval-second pieces (required for live).
lm_window_seconds --lm-window-seconds integer (seconds) 120 How much transcription context the LLM sees. It gets the last ceil(window_seconds / interval) chunks.
lm_interval --lm-interval integer (seconds) 30 How often the LLM processes accumulated text. Can differ from interval.
lm_output_mode --lm-output-mode "both" | "transcriptions-only" | "lm-only" "both" both: print transcriptions AND LLM summaries. transcriptions-only: skip LLM entirely. lm-only: only show LLM output.
lm_api_key_file --lm-api-key-file string (file path) "api_key.txt" File containing OpenRouter API key.
lm_prompt_file --lm-prompt-file string (file path) "prompt.md" Prompt template for Russian-language LLM processing.
lm_model --lm-model string "google/gemma-3-27b-it:free" OpenRouter model identifier. See openrouter.ai/models for options. Free models have :free suffix.
debug --debug boolean false false (production): logs go to logs.txt, stdout shows only transcription/LLM output. true: verbose logs go to stderr, useful for troubleshooting.
context --context string "" Additional context passed to the LLM (e.g. "This is a Sberbank earnings call"). Helps the model produce better summaries.
allowed_user_ids array of integers [] Telegram user IDs allowed to use the bot. Users can also authenticate at runtime via AUTH_KEY.

Example config.json

{
  "url": "",
  "method": "auto",
  "interval": 30,
  "language": "en",
  "whisper_model": "medium",
  "use_faster_whisper": true,
  "translation_output": "russian-only",
  "save_transcript": null,
  "overlap": 2,
  "process_mode": "full",
  "lm_window_seconds": 120,
  "lm_interval": 30,
  "lm_output_mode": "both",
  "lm_api_key_file": "api_key.txt",
  "lm_prompt_file": "prompt.md",
  "lm_model": "google/gemma-3-27b-it:free",
  "debug": false,
  "context": "",
  "allowed_user_ids": []
}

Environment Variables

These are primarily for the Telegram bot server, not the CLI:

Variable Purpose Fallback file
BOT_TOKEN Telegram bot token bot_key.txt
TELEGRAM_API_ID Pyrogram MTProto API ID (from my.telegram.org)
TELEGRAM_API_HASH Pyrogram MTProto API hash
AUTH_KEY One-time auth key for new bot users key.txt
OPENROUTER_API_KEY OpenRouter API key (alternative to api_key.txt) api_key.txt
HOST Server bind address 0.0.0.0
PORT Server port 8000
CONFIG_PATH Path to config.json config.json
METADATA_PATH Path to metadata.json metadata.json
TRNS_HOME Base directory for resolving all relative paths CWD

File Layout

When you run TRNS, it expects these files relative to TRNS_HOME (defaults to your current working directory):

your-project/
├── config.json          # Main configuration (auto-created on first run)
├── metadata.json        # Localization strings + daily capacity counter
├── api_key.txt          # OpenRouter API key(s), one per line
├── prompt.md            # LLM prompt template (Russian output)
├── prompt_original.md   # LLM prompt template (original language output)
├── bot_key.txt          # Telegram bot token (alternative to env var)
├── key.txt              # Auth key (alternative to env var)
└── logs.txt             # Production logs (auto-created)

How It Works

Pipeline Flow

Video URL or file
    │
    ├─ 1. Try YouTube auto-captions (if method=auto and it's YouTube)
    │     └─ Success? Skip Whisper, go to step 3
    │
    ├─ 2. Download audio → Whisper speech-to-text
    │     ├─ Language auto-detection
    │     ├─ Chunk overlap to prevent word loss
    │     └─ Progress bar (full mode) or streaming (chunked mode)
    │
    ├─ 3. Translate to Russian (if source ≠ Russian)
    │     └─ Google Translate via deep-translator
    │
    └─ 4. LLM summarization (if lm_output_mode ≠ transcriptions-only)
          ├─ Sends last N seconds of transcription (lm_window_seconds)
          ├─ Bilingual mode: separate prompts for original + Russian
          └─ Output: structured summary per prompt template

Processing Modes

Mode When How
full Regular videos Downloads entire video first, transcribes with progress bar. Best quality.
chunked Live streams Processes audio in interval-second chunks. Real-time output.
auto Default Picks full for regular videos, chunked for live streams.

Whisper Models

Model Size Speed Quality RAM
tiny 39M ~32x realtime Basic ~1 GB
base 74M ~16x realtime OK ~1 GB
small 244M ~6x realtime Good ~2 GB
medium 769M ~2x realtime Very good ~5 GB
large 1550M ~1x realtime Best ~10 GB

API Key Setup

Put your OpenRouter API key in api_key.txt. The system tracks daily usage capacity in metadata.json and resets it at UTC midnight.


Architecture

┌─────────────────────────────────────────────────────┐
│                   User Interface                     │
├────────────────────┬────────────────────────────────┤
│   CLI (trns cmd)   │  Telegram Bot (Pyrogram+FastAPI)│
└─────────┬──────────┴──────────────┬─────────────────┘
          └────────────┬────────────┘
                       │
         ┌─────────────▼──────────────┐
         │   TranscriptionPipeline    │
         │   (orchestration + threads)│
         └─────────────┬──────────────┘
                       │
     ┌─────────────────┼─────────────────┐
     │                 │                 │
┌────▼─────┐   ┌──────▼──────┐   ┌──────▼──────┐
│  yt-dlp  │   │   faster-   │   │ OpenRouter  │
│  audio   │   │   whisper   │   │    LLM      │
│ download │   │   STT       │   │  summaries  │
└──────────┘   └─────────────┘   └─────────────┘

Threading (Telegram Bot)

The bot uses a queue-based architecture for thread-safe output:

  1. Webhook arrives → FastAPI handler → spawns background thread
  2. Background thread runs TranscriptionPipeline with an output_callback
  3. output_callback puts text into a queue.Queue
  4. Async loop drains the queue and sends messages to Telegram

No global state mutation — each pipeline instance is independent.


Development

git clone https://github.com/kakoyvostorg/trns.git
cd trns
pip install -e ".[dev]"
pytest                    # 117 tests
ruff format .             # code formatting

Docker

docker build -f docker/Dockerfile -t trns .
docker-compose -f docker/docker-compose.yml up

Further Reading

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trns-0.2.1.tar.gz (74.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trns-0.2.1-py3-none-any.whl (65.4 kB view details)

Uploaded Python 3

File details

Details for the file trns-0.2.1.tar.gz.

File metadata

  • Download URL: trns-0.2.1.tar.gz
  • Upload date:
  • Size: 74.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for trns-0.2.1.tar.gz
Algorithm Hash digest
SHA256 a5b8e085cf8d4984f1cc32590733931b21000187e6624ea3efa7b1a1a352fbbd
MD5 a5dafef4ecc59bf03bed6ff65be76a3b
BLAKE2b-256 e0eb891958013225abd2c8c87a70e6ab28b3d7ea2cd73d45b3a023ab93b58d1b

See more details on using hashes here.

File details

Details for the file trns-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: trns-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 65.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for trns-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 295535a00b26a2db4954c31aa096a0dbbff5215cd68ee6f0ce91dde367b1907e
MD5 67c6cd3272d246372e14ec866721da5d
BLAKE2b-256 7866b59843d47f8987414096866b4cc104f9601ad0c884871d38f0b1871f7570

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page