Transcribe YouTube, TikTok, and Twitch videos to text with speaker diarization
Project description
Trans - Video Transcription Tool
Quick command-line tool to transcribe videos and audio files to text.
Features
- Local file support: Transcribe audio (mp3, wav, m4a, flac, etc.) and video (mp4, mkv, avi, etc.) files directly
- Multi-platform support: YouTube, TikTok, and Twitch (VODs and clips)
- Automatic source selection: Tries native captions first (YouTube), falls back to Whisper AI
- Speaker diarization: Identify who said what (requires pyannote-audio)
- Multiple output formats: TXT, SRT, VTT, JSON, or all formats at once
- Whisper model selection: Choose from tiny, base, small, medium, or large models
- Language support: Auto-detect or specify language (en, es, fr, etc.)
- Clipboard integration: Automatically copy transcripts to clipboard (cross-platform)
- Batch processing: Process multiple videos in one command — URLs downloaded concurrently
- Persistent config: Save your preferred model, format, and output directory
- Cache management: Transcripts cached with TTL, inspect or clear via
trans cache - Clean filenames: Auto-generated names based on video titles
- Quiet mode: Minimal output for scripting
Installation
Via pip (recommended)
# Basic installation
pip install trans-cli
# With speaker diarization support
pip install trans-cli[diarize]
From source
git clone https://github.com/ree-see/trans.git
cd trans
pip install -e .
# With optional features
pip install -e ".[diarize]"
Requirements
- Python 3.9+
- FFmpeg (for audio extraction)
Install FFmpeg:
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt install ffmpeg
# Fedora
sudo dnf install ffmpeg
Usage
All transcription goes through the transcribe subcommand (or trans transcribe):
Local Files
# Audio files
trans transcribe recording.mp3
trans transcribe interview.wav
trans transcribe ~/Downloads/podcast.m4a
# Video files (audio auto-extracted)
trans transcribe meeting.mp4
trans transcribe lecture.mkv
# With speaker identification
trans transcribe --diarize meeting.mp4
# Higher quality model
trans transcribe --model medium conference_talk.mp3
URLs
# YouTube (uses native captions when available)
trans transcribe "https://youtube.com/watch?v=..."
# Custom output name
trans transcribe -o my_video "https://tiktok.com/@user/video/123"
# Twitch VOD
trans transcribe "https://twitch.tv/videos/123456789"
# Twitch clip
trans transcribe "https://clips.twitch.tv/FunnyClipName"
# Copy to clipboard automatically
trans transcribe -c "https://youtube.com/watch?v=..."
Output Formats
# Plain text (default)
trans transcribe "URL"
# SRT subtitles
trans transcribe -f srt "URL"
# VTT subtitles
trans transcribe -f vtt "URL"
# JSON with metadata
trans transcribe -f json "URL"
# All formats at once
trans transcribe -f all "URL"
Whisper Models
trans transcribe -m tiny "URL" # Fastest, lower accuracy
trans transcribe -m base "URL" # Balanced (default)
trans transcribe -m small "URL" # Better accuracy
trans transcribe -m medium "URL" # High accuracy
trans transcribe -m large "URL" # Best accuracy, slowest
Language Options
trans transcribe "URL" # Auto-detect (default)
trans transcribe -l en "URL" # English
trans transcribe -l es "URL" # Spanish
trans transcribe -l fr "URL" # French
trans transcribe -l ja "URL" # Japanese
Speaker Diarization
Identify different speakers in the transcript:
trans transcribe --diarize "https://youtube.com/watch?v=..."
# Specify number of speakers (improves accuracy)
trans transcribe --diarize --num-speakers 2 "URL"
# With subtitles
trans transcribe -d -f srt "URL"
Output example (txt):
[Speaker 1]
Hello and welcome to the show.
Today we have a special guest.
[Speaker 2]
Thanks for having me!
I'm excited to be here.
Requirements:
pyannote-audiopackage:pip install pyannote-audio- HuggingFace token (free): https://huggingface.co/settings/tokens
- Accept model license: https://huggingface.co/pyannote/speaker-diarization-3.1
pip install pyannote-audio
huggingface-cli login # or: export HF_TOKEN=hf_your_token_here
Batch Processing
# Multiple inputs (URLs downloaded concurrently, transcribed with shared model)
trans transcribe "URL1" "URL2" "URL3"
# Mix local files and URLs
trans transcribe recording.mp3 "URL1" meeting.mp4
# Quiet batch
trans transcribe -q "URL1" "URL2" "URL3"
Advanced Options
# Save output to a specific directory
trans transcribe --output-dir ~/transcripts "URL"
# Keep downloaded audio file
trans transcribe -k "URL"
# Add timestamp to filename (prevents overwrites)
trans transcribe -t "URL"
# Skip cache lookup
trans transcribe --no-cache "URL"
# Always use Whisper (skip native caption check)
trans transcribe --force-whisper "URL"
# TikTok with cookies
trans transcribe --cookies cookies.txt "https://tiktok.com/@user/video/123"
Cache Management
Transcripts are cached automatically (default TTL: 30 days):
# Show cache stats
trans cache stats
# Clear all cached transcripts
trans cache clear
Persistent Configuration
Save your preferences so you don't have to repeat flags:
# Show current config (and config file path)
trans config show
# Set defaults
trans config set model small
trans config set format srt
trans config set output_dir ~/transcripts
trans config set clipboard true
trans config set quiet true
trans config set cache.ttl_days 60
trans config set diarization.hf_token hf_your_token_here
Config is stored in the OS-appropriate location:
- macOS:
~/Library/Application Support/trans/config.toml - Linux:
~/.config/trans/config.toml - Windows:
%APPDATA%\trans\config.toml
Examples
Quick transcription to clipboard
trans transcribe -c "https://youtube.com/watch?v=dQw4w9WgXcQ"
Create subtitles for video editing
trans transcribe -f srt -m small -o my_subtitles "https://youtube.com/watch?v=..."
Transcribe a foreign language video
trans transcribe -l es -f all "https://youtube.com/watch?v=..."
Batch research videos, quietly
trans transcribe -q \
"https://youtube.com/watch?v=video1" \
"https://youtube.com/watch?v=video2" \
"https://youtube.com/watch?v=video3"
Set-and-forget config workflow
trans config set model small
trans config set output_dir ~/transcripts
trans config set clipboard true
# Now every transcription uses these defaults:
trans transcribe "URL"
File Naming
- Default: Auto-generated from video/file title →
How_to_Use_Python.txt - Custom:
trans transcribe -o my_notes "URL"→my_notes.txt - With timestamp:
trans transcribe -t "URL"→How_to_Use_Python_20260222_153045.txt - Custom directory:
trans transcribe --output-dir ~/docs "URL"→~/docs/How_to_Use_Python.txt
Supported Formats
| Format | Extension | Use case |
|---|---|---|
| txt | .txt | Plain text, notes |
| srt | .srt | Video editing |
| vtt | .vtt | Web players |
| json | .json | Full metadata + timestamps |
| all | all above | Everything at once |
Supported File Types
Audio: mp3, wav, m4a, flac, ogg, opus, aac, wma Video (audio auto-extracted): mp4, mkv, avi, mov, webm, flv, wmv, m4v, mpeg, mpg
Command Reference
trans [--version] [--help] COMMAND
Commands:
transcribe Transcribe video/audio URLs or local files
cache Manage the transcript cache
config Manage persistent configuration
trans transcribe [OPTIONS] INPUTS...
Options:
-o, --output PATH Output base path (no extension, single input only)
--output-dir DIR Directory for output files
-m, --model MODEL Whisper model: tiny, base, small, medium, large
-l, --language LANG Language code (e.g. en, es). Auto-detect if unset.
-f, --format FORMAT Output format: txt, srt, vtt, json, all
-c, --clipboard Copy transcript to clipboard
-k, --keep-audio Keep downloaded audio file
-t, --timestamp Add timestamp to output filename
-q, --quiet Minimal output (errors only)
--cookies PATH Path to cookies.txt for authenticated downloads
--no-cache Skip cache lookup
--force-whisper Skip native captions, always use Whisper
-d, --diarize Enable speaker diarization
--num-speakers N Number of speakers (helps diarization accuracy)
Twitch Notes
Twitch videos rarely have native captions, so Whisper is used automatically. For long VODs use -m tiny or -m base for speed.
TikTok Notes
TikTok aggressively blocks datacenter IPs. If you see "IP address is blocked":
- Use cookies: Export from your browser with a "Get cookies.txt" extension:
trans transcribe --cookies cookies.txt "https://tiktok.com/@user/video/123"
- Run from a residential IP (home internet, not a VPS)
- Use a residential VPN
Troubleshooting
| Error | Fix |
|---|---|
No such file or directory: 'ffmpeg' |
brew install ffmpeg or apt install ffmpeg |
| Whisper is slow | Use a smaller model: -m tiny |
| Wrong language detected | Specify explicitly: -l en |
| TikTok blocked | See TikTok Notes above |
Development
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests (52 tests, all offline)
pytest test_trans.py
# Lint / format
ruff check trans/
black trans/
Credits
Built with:
- yt-dlp — Video/audio downloading
- faster-whisper — Speech recognition
- FFmpeg — Audio processing
- Typer — CLI framework
- pyannote-audio — Speaker diarization (optional)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file boswell-0.4.0.tar.gz.
File metadata
- Download URL: boswell-0.4.0.tar.gz
- Upload date:
- Size: 17.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f0c4bc4a7ebc20802d905dc0994234ac387808887adb0390dafc23e8a5d9445
|
|
| MD5 |
8cacb3720bce0d986a2c7c8d3f2ca037
|
|
| BLAKE2b-256 |
8fddfdc97352d6aa8e704a70f0bb9ae6b6a277be169848ab86a0f081eea636dd
|
File details
Details for the file boswell-0.4.0-py3-none-any.whl.
File metadata
- Download URL: boswell-0.4.0-py3-none-any.whl
- Upload date:
- Size: 21.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c72743eda49ff1cb5c64492909e3d7ed1392a37a4d47e093a1cad46088166715
|
|
| MD5 |
b88da9230831373f1e971d238d46a22a
|
|
| BLAKE2b-256 |
dc9ded4684e8299196bd83d1e61855ee93bc0526ce6f848b46472a06636837d5
|