Transcribe YouTube, TikTok, and Twitch videos to text with speaker diarization

These details have not been verified by PyPI

Project links

Project description

Trans - Video Transcription Tool

Quick command-line tool to transcribe videos and audio files to text.

Features

Local file support: Transcribe audio (mp3, wav, m4a, flac, etc.) and video (mp4, mkv, avi, etc.) files directly
Multi-platform support: YouTube, TikTok, and Twitch (VODs and clips)
Automatic source selection: Tries native captions first (YouTube), falls back to Whisper AI
Speaker diarization: Identify who said what (requires pyannote-audio)
Multiple output formats: TXT, SRT, VTT, JSON, or all formats at once
Whisper model selection: Choose from tiny, base, small, medium, or large models
Language support: Auto-detect or specify language (en, es, fr, etc.)
Clipboard integration: Automatically copy transcripts to clipboard (cross-platform)
Batch processing: Process multiple videos in one command — URLs downloaded concurrently
Persistent config: Save your preferred model, format, and output directory
Cache management: Transcripts cached with TTL, inspect or clear via trans cache
Clean filenames: Auto-generated names based on video titles
Quiet mode: Minimal output for scripting

Installation

Via pip (recommended)

# Basic installation
pip install trans-cli

# With speaker diarization support
pip install trans-cli[diarize]

From source

git clone https://github.com/ree-see/trans.git
cd trans
pip install -e .

# With optional features
pip install -e ".[diarize]"

Requirements

Python 3.9+
FFmpeg (for audio extraction)

Install FFmpeg:

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# Fedora
sudo dnf install ffmpeg

Usage

All transcription goes through the transcribe subcommand (or trans transcribe):

Local Files

# Audio files
trans transcribe recording.mp3
trans transcribe interview.wav
trans transcribe ~/Downloads/podcast.m4a

# Video files (audio auto-extracted)
trans transcribe meeting.mp4
trans transcribe lecture.mkv

# With speaker identification
trans transcribe --diarize meeting.mp4

# Higher quality model
trans transcribe --model medium conference_talk.mp3

URLs

# YouTube (uses native captions when available)
trans transcribe "https://youtube.com/watch?v=..."

# Custom output name
trans transcribe -o my_video "https://tiktok.com/@user/video/123"

# Twitch VOD
trans transcribe "https://twitch.tv/videos/123456789"

# Twitch clip
trans transcribe "https://clips.twitch.tv/FunnyClipName"

# Copy to clipboard automatically
trans transcribe -c "https://youtube.com/watch?v=..."

Output Formats

# Plain text (default)
trans transcribe "URL"

# SRT subtitles
trans transcribe -f srt "URL"

# VTT subtitles
trans transcribe -f vtt "URL"

# JSON with metadata
trans transcribe -f json "URL"

# All formats at once
trans transcribe -f all "URL"

Whisper Models

trans transcribe -m tiny   "URL"    # Fastest, lower accuracy
trans transcribe -m base   "URL"    # Balanced (default)
trans transcribe -m small  "URL"    # Better accuracy
trans transcribe -m medium "URL"    # High accuracy
trans transcribe -m large  "URL"    # Best accuracy, slowest

Language Options

trans transcribe "URL"         # Auto-detect (default)
trans transcribe -l en "URL"   # English
trans transcribe -l es "URL"   # Spanish
trans transcribe -l fr "URL"   # French
trans transcribe -l ja "URL"   # Japanese

Speaker Diarization

Identify different speakers in the transcript:

trans transcribe --diarize "https://youtube.com/watch?v=..."

# Specify number of speakers (improves accuracy)
trans transcribe --diarize --num-speakers 2 "URL"

# With subtitles
trans transcribe -d -f srt "URL"

Output example (txt):

[Speaker 1]
Hello and welcome to the show.
Today we have a special guest.

[Speaker 2]
Thanks for having me!
I'm excited to be here.

Requirements:

pyannote-audio package: pip install pyannote-audio
HuggingFace token (free): https://huggingface.co/settings/tokens
Accept model license: https://huggingface.co/pyannote/speaker-diarization-3.1

pip install pyannote-audio
huggingface-cli login          # or: export HF_TOKEN=hf_your_token_here

Batch Processing

# Multiple inputs (URLs downloaded concurrently, transcribed with shared model)
trans transcribe "URL1" "URL2" "URL3"

# Mix local files and URLs
trans transcribe recording.mp3 "URL1" meeting.mp4

# Quiet batch
trans transcribe -q "URL1" "URL2" "URL3"

Advanced Options

# Save output to a specific directory
trans transcribe --output-dir ~/transcripts "URL"

# Keep downloaded audio file
trans transcribe -k "URL"

# Add timestamp to filename (prevents overwrites)
trans transcribe -t "URL"

# Skip cache lookup
trans transcribe --no-cache "URL"

# Always use Whisper (skip native caption check)
trans transcribe --force-whisper "URL"

# TikTok with cookies
trans transcribe --cookies cookies.txt "https://tiktok.com/@user/video/123"

Cache Management

Transcripts are cached automatically (default TTL: 30 days):

# Show cache stats
trans cache stats

# Clear all cached transcripts
trans cache clear

Persistent Configuration

Save your preferences so you don't have to repeat flags:

# Show current config (and config file path)
trans config show

# Set defaults
trans config set model small
trans config set format srt
trans config set output_dir ~/transcripts
trans config set clipboard true
trans config set quiet true
trans config set cache.ttl_days 60
trans config set diarization.hf_token hf_your_token_here

Config is stored in the OS-appropriate location:

macOS: ~/Library/Application Support/trans/config.toml
Linux: ~/.config/trans/config.toml
Windows: %APPDATA%\trans\config.toml

Examples

Quick transcription to clipboard

trans transcribe -c "https://youtube.com/watch?v=dQw4w9WgXcQ"

Create subtitles for video editing

trans transcribe -f srt -m small -o my_subtitles "https://youtube.com/watch?v=..."

Transcribe a foreign language video

trans transcribe -l es -f all "https://youtube.com/watch?v=..."

Batch research videos, quietly

trans transcribe -q \
  "https://youtube.com/watch?v=video1" \
  "https://youtube.com/watch?v=video2" \
  "https://youtube.com/watch?v=video3"

Set-and-forget config workflow

trans config set model small
trans config set output_dir ~/transcripts
trans config set clipboard true
# Now every transcription uses these defaults:
trans transcribe "URL"

File Naming

Default: Auto-generated from video/file title → How_to_Use_Python.txt
Custom: trans transcribe -o my_notes "URL" → my_notes.txt
With timestamp: trans transcribe -t "URL" → How_to_Use_Python_20260222_153045.txt
Custom directory: trans transcribe --output-dir ~/docs "URL" → ~/docs/How_to_Use_Python.txt

Supported Formats

Format	Extension	Use case
txt	.txt	Plain text, notes
srt	.srt	Video editing
vtt	.vtt	Web players
json	.json	Full metadata + timestamps
all	all above	Everything at once

Supported File Types

Audio: mp3, wav, m4a, flac, ogg, opus, aac, wma Video (audio auto-extracted): mp4, mkv, avi, mov, webm, flv, wmv, m4v, mpeg, mpg

Command Reference

trans [--version] [--help] COMMAND

Commands:
  transcribe   Transcribe video/audio URLs or local files
  cache        Manage the transcript cache
  config       Manage persistent configuration

trans transcribe [OPTIONS] INPUTS...

Options:
  -o, --output PATH        Output base path (no extension, single input only)
  --output-dir DIR         Directory for output files
  -m, --model MODEL        Whisper model: tiny, base, small, medium, large
  -l, --language LANG      Language code (e.g. en, es). Auto-detect if unset.
  -f, --format FORMAT      Output format: txt, srt, vtt, json, all
  -c, --clipboard          Copy transcript to clipboard
  -k, --keep-audio         Keep downloaded audio file
  -t, --timestamp          Add timestamp to output filename
  -q, --quiet              Minimal output (errors only)
  --cookies PATH           Path to cookies.txt for authenticated downloads
  --no-cache               Skip cache lookup
  --force-whisper          Skip native captions, always use Whisper
  -d, --diarize            Enable speaker diarization
  --num-speakers N         Number of speakers (helps diarization accuracy)

Twitch Notes

Twitch videos rarely have native captions, so Whisper is used automatically. For long VODs use -m tiny or -m base for speed.

TikTok Notes

TikTok aggressively blocks datacenter IPs. If you see "IP address is blocked":

Use cookies: Export from your browser with a "Get cookies.txt" extension:

trans transcribe --cookies cookies.txt "https://tiktok.com/@user/video/123"

Run from a residential IP (home internet, not a VPS)
Use a residential VPN

Troubleshooting

Error	Fix
`No such file or directory: 'ffmpeg'`	`brew install ffmpeg` or `apt install ffmpeg`
Whisper is slow	Use a smaller model: `-m tiny`
Wrong language detected	Specify explicitly: `-l en`
TikTok blocked	See TikTok Notes above

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests (52 tests, all offline)
pytest test_trans.py

# Lint / format
ruff check trans/
black trans/

Credits

Built with:

yt-dlp — Video/audio downloading
faster-whisper — Speech recognition
FFmpeg — Audio processing
Typer — CLI framework
pyannote-audio — Speaker diarization (optional)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.0

Feb 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

boswell-0.4.0.tar.gz (17.8 kB view details)

Uploaded Feb 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

boswell-0.4.0-py3-none-any.whl (21.2 kB view details)

Uploaded Feb 22, 2026 Python 3

File details

Details for the file boswell-0.4.0.tar.gz.

File metadata

Download URL: boswell-0.4.0.tar.gz
Upload date: Feb 22, 2026
Size: 17.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for boswell-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`9f0c4bc4a7ebc20802d905dc0994234ac387808887adb0390dafc23e8a5d9445`
MD5	`8cacb3720bce0d986a2c7c8d3f2ca037`
BLAKE2b-256	`8fddfdc97352d6aa8e704a70f0bb9ae6b6a277be169848ab86a0f081eea636dd`

See more details on using hashes here.

File details

Details for the file boswell-0.4.0-py3-none-any.whl.

File metadata

Download URL: boswell-0.4.0-py3-none-any.whl
Upload date: Feb 22, 2026
Size: 21.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for boswell-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c72743eda49ff1cb5c64492909e3d7ed1392a37a4d47e093a1cad46088166715`
MD5	`b88da9230831373f1e971d238d46a22a`
BLAKE2b-256	`dc9ded4684e8299196bd83d1e61855ee93bc0526ce6f848b46472a06636837d5`

See more details on using hashes here.

boswell 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Trans - Video Transcription Tool

Features

Installation

Via pip (recommended)

From source

Requirements

Usage

Local Files

URLs

Output Formats

Whisper Models

Language Options

Speaker Diarization

Batch Processing

Advanced Options

Cache Management

Persistent Configuration

Examples

Quick transcription to clipboard

Create subtitles for video editing

Transcribe a foreign language video

Batch research videos, quietly

Set-and-forget config workflow

File Naming

Supported Formats

Supported File Types

Command Reference

Twitch Notes

TikTok Notes

Troubleshooting

Development

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes