Skip to main content

Transcribe YouTube, TikTok, and Twitch videos to text with speaker diarization

Project description

Trans - Video Transcription Tool

Quick command-line tool to transcribe videos and audio files to text.

Features

  • Local file support: Transcribe audio (mp3, wav, m4a, flac, etc.) and video (mp4, mkv, avi, etc.) files directly
  • Multi-platform support: YouTube, TikTok, and Twitch (VODs and clips)
  • Automatic source selection: Tries native captions first (YouTube), falls back to Whisper AI
  • Speaker diarization: Identify who said what (requires pyannote-audio)
  • Multiple output formats: TXT, SRT, VTT, JSON, or all formats at once
  • Whisper model selection: Choose from tiny, base, small, medium, or large models
  • Language support: Auto-detect or specify language (en, es, fr, etc.)
  • Clipboard integration: Automatically copy transcripts to clipboard (cross-platform)
  • Batch processing: Process multiple videos in one command — URLs downloaded concurrently
  • Persistent config: Save your preferred model, format, and output directory
  • Cache management: Transcripts cached with TTL, inspect or clear via trans cache
  • Clean filenames: Auto-generated names based on video titles
  • Quiet mode: Minimal output for scripting

Installation

Via pip (recommended)

# Basic installation
pip install trans-cli

# With speaker diarization support
pip install trans-cli[diarize]

From source

git clone https://github.com/ree-see/trans.git
cd trans
pip install -e .

# With optional features
pip install -e ".[diarize]"

Requirements

  • Python 3.9+
  • FFmpeg (for audio extraction)

Install FFmpeg:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# Fedora
sudo dnf install ffmpeg

Usage

All transcription goes through the transcribe subcommand (or trans transcribe):

Local Files

# Audio files
trans transcribe recording.mp3
trans transcribe interview.wav
trans transcribe ~/Downloads/podcast.m4a

# Video files (audio auto-extracted)
trans transcribe meeting.mp4
trans transcribe lecture.mkv

# With speaker identification
trans transcribe --diarize meeting.mp4

# Higher quality model
trans transcribe --model medium conference_talk.mp3

URLs

# YouTube (uses native captions when available)
trans transcribe "https://youtube.com/watch?v=..."

# Custom output name
trans transcribe -o my_video "https://tiktok.com/@user/video/123"

# Twitch VOD
trans transcribe "https://twitch.tv/videos/123456789"

# Twitch clip
trans transcribe "https://clips.twitch.tv/FunnyClipName"

# Copy to clipboard automatically
trans transcribe -c "https://youtube.com/watch?v=..."

Output Formats

# Plain text (default)
trans transcribe "URL"

# SRT subtitles
trans transcribe -f srt "URL"

# VTT subtitles
trans transcribe -f vtt "URL"

# JSON with metadata
trans transcribe -f json "URL"

# All formats at once
trans transcribe -f all "URL"

Whisper Models

trans transcribe -m tiny   "URL"    # Fastest, lower accuracy
trans transcribe -m base   "URL"    # Balanced (default)
trans transcribe -m small  "URL"    # Better accuracy
trans transcribe -m medium "URL"    # High accuracy
trans transcribe -m large  "URL"    # Best accuracy, slowest

Language Options

trans transcribe "URL"         # Auto-detect (default)
trans transcribe -l en "URL"   # English
trans transcribe -l es "URL"   # Spanish
trans transcribe -l fr "URL"   # French
trans transcribe -l ja "URL"   # Japanese

Speaker Diarization

Identify different speakers in the transcript:

trans transcribe --diarize "https://youtube.com/watch?v=..."

# Specify number of speakers (improves accuracy)
trans transcribe --diarize --num-speakers 2 "URL"

# With subtitles
trans transcribe -d -f srt "URL"

Output example (txt):

[Speaker 1]
Hello and welcome to the show.
Today we have a special guest.

[Speaker 2]
Thanks for having me!
I'm excited to be here.

Requirements:

pip install pyannote-audio
huggingface-cli login          # or: export HF_TOKEN=hf_your_token_here

Batch Processing

# Multiple inputs (URLs downloaded concurrently, transcribed with shared model)
trans transcribe "URL1" "URL2" "URL3"

# Mix local files and URLs
trans transcribe recording.mp3 "URL1" meeting.mp4

# Quiet batch
trans transcribe -q "URL1" "URL2" "URL3"

Advanced Options

# Save output to a specific directory
trans transcribe --output-dir ~/transcripts "URL"

# Keep downloaded audio file
trans transcribe -k "URL"

# Add timestamp to filename (prevents overwrites)
trans transcribe -t "URL"

# Skip cache lookup
trans transcribe --no-cache "URL"

# Always use Whisper (skip native caption check)
trans transcribe --force-whisper "URL"

# TikTok with cookies
trans transcribe --cookies cookies.txt "https://tiktok.com/@user/video/123"

Cache Management

Transcripts are cached automatically (default TTL: 30 days):

# Show cache stats
trans cache stats

# Clear all cached transcripts
trans cache clear

Persistent Configuration

Save your preferences so you don't have to repeat flags:

# Show current config (and config file path)
trans config show

# Set defaults
trans config set model small
trans config set format srt
trans config set output_dir ~/transcripts
trans config set clipboard true
trans config set quiet true
trans config set cache.ttl_days 60
trans config set diarization.hf_token hf_your_token_here

Config is stored in the OS-appropriate location:

  • macOS: ~/Library/Application Support/trans/config.toml
  • Linux: ~/.config/trans/config.toml
  • Windows: %APPDATA%\trans\config.toml

Examples

Quick transcription to clipboard

trans transcribe -c "https://youtube.com/watch?v=dQw4w9WgXcQ"

Create subtitles for video editing

trans transcribe -f srt -m small -o my_subtitles "https://youtube.com/watch?v=..."

Transcribe a foreign language video

trans transcribe -l es -f all "https://youtube.com/watch?v=..."

Batch research videos, quietly

trans transcribe -q \
  "https://youtube.com/watch?v=video1" \
  "https://youtube.com/watch?v=video2" \
  "https://youtube.com/watch?v=video3"

Set-and-forget config workflow

trans config set model small
trans config set output_dir ~/transcripts
trans config set clipboard true
# Now every transcription uses these defaults:
trans transcribe "URL"

File Naming

  • Default: Auto-generated from video/file title → How_to_Use_Python.txt
  • Custom: trans transcribe -o my_notes "URL"my_notes.txt
  • With timestamp: trans transcribe -t "URL"How_to_Use_Python_20260222_153045.txt
  • Custom directory: trans transcribe --output-dir ~/docs "URL"~/docs/How_to_Use_Python.txt

Supported Formats

Format Extension Use case
txt .txt Plain text, notes
srt .srt Video editing
vtt .vtt Web players
json .json Full metadata + timestamps
all all above Everything at once

Supported File Types

Audio: mp3, wav, m4a, flac, ogg, opus, aac, wma Video (audio auto-extracted): mp4, mkv, avi, mov, webm, flv, wmv, m4v, mpeg, mpg

Command Reference

trans [--version] [--help] COMMAND

Commands:
  transcribe   Transcribe video/audio URLs or local files
  cache        Manage the transcript cache
  config       Manage persistent configuration

trans transcribe [OPTIONS] INPUTS...

Options:
  -o, --output PATH        Output base path (no extension, single input only)
  --output-dir DIR         Directory for output files
  -m, --model MODEL        Whisper model: tiny, base, small, medium, large
  -l, --language LANG      Language code (e.g. en, es). Auto-detect if unset.
  -f, --format FORMAT      Output format: txt, srt, vtt, json, all
  -c, --clipboard          Copy transcript to clipboard
  -k, --keep-audio         Keep downloaded audio file
  -t, --timestamp          Add timestamp to output filename
  -q, --quiet              Minimal output (errors only)
  --cookies PATH           Path to cookies.txt for authenticated downloads
  --no-cache               Skip cache lookup
  --force-whisper          Skip native captions, always use Whisper
  -d, --diarize            Enable speaker diarization
  --num-speakers N         Number of speakers (helps diarization accuracy)

Twitch Notes

Twitch videos rarely have native captions, so Whisper is used automatically. For long VODs use -m tiny or -m base for speed.

TikTok Notes

TikTok aggressively blocks datacenter IPs. If you see "IP address is blocked":

  1. Use cookies: Export from your browser with a "Get cookies.txt" extension:
    trans transcribe --cookies cookies.txt "https://tiktok.com/@user/video/123"
    
  2. Run from a residential IP (home internet, not a VPS)
  3. Use a residential VPN

Troubleshooting

Error Fix
No such file or directory: 'ffmpeg' brew install ffmpeg or apt install ffmpeg
Whisper is slow Use a smaller model: -m tiny
Wrong language detected Specify explicitly: -l en
TikTok blocked See TikTok Notes above

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests (52 tests, all offline)
pytest test_trans.py

# Lint / format
ruff check trans/
black trans/

Credits

Built with:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

boswell-0.4.0.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

boswell-0.4.0-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file boswell-0.4.0.tar.gz.

File metadata

  • Download URL: boswell-0.4.0.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for boswell-0.4.0.tar.gz
Algorithm Hash digest
SHA256 9f0c4bc4a7ebc20802d905dc0994234ac387808887adb0390dafc23e8a5d9445
MD5 8cacb3720bce0d986a2c7c8d3f2ca037
BLAKE2b-256 8fddfdc97352d6aa8e704a70f0bb9ae6b6a277be169848ab86a0f081eea636dd

See more details on using hashes here.

File details

Details for the file boswell-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: boswell-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for boswell-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c72743eda49ff1cb5c64492909e3d7ed1392a37a4d47e093a1cad46088166715
MD5 b88da9230831373f1e971d238d46a22a
BLAKE2b-256 dc9ded4684e8299196bd83d1e61855ee93bc0526ce6f848b46472a06636837d5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page