Skip to main content

Local speaker diarization using MLX Whisper (macOS) or faster-whisper (Linux/CUDA) and Pyannote

Project description

VoxScriber

PyPI version Downloads License: MIT Python 3.10+

Professional speaker diarization running 100% locally. Supports MLX Whisper on Apple Silicon and faster-whisper on Linux/CUDA, combined with Pyannote 3.1.

VoxScriber Banner

Requirements

  • Python 3.10+
  • Hugging Face token (free, one-time model download)
  • For GPU: CUDA 12 + cuDNN 9 (optional, CPU works too)

That's it. No FFmpeg, no system packages, no sudo required.

Installation

pip install voxscriber

The right Whisper backend is installed automatically:

  • macOS Apple Silicon: MLX Whisper
  • Linux/other: faster-whisper (CUDA or CPU)

Setup Hugging Face Token

VoxScriber uses pyannote models which require a Hugging Face token.

Option 1: Interactive setup (recommended)

voxscriber-doctor

This will guide you through accepting the model terms and saving your token securely.

Option 2: Using huggingface-cli

# First, accept terms at https://huggingface.co/pyannote/speaker-diarization-3.1
huggingface-cli login

Your token will be saved to ~/.cache/huggingface/token and used automatically.

Option 3: Environment variable

export HF_TOKEN=your_token_here

Usage

# Basic
voxscriber meeting.m4a

# With known speaker count
voxscriber meeting.m4a --speakers 2

# All formats
voxscriber meeting.m4a --formats md,txt,json,srt,vtt

# Sentence-level subtitle segmentation for editing workflows
voxscriber meeting.m4a --formats srt,vtt --srt-mode sentence --srt-max-duration 15

# Print to console
voxscriber meeting.m4a --print

Python API

from voxscriber import DiarizationPipeline, PipelineConfig

config = PipelineConfig(
    num_speakers=2,
    language="en",
)
pipeline = DiarizationPipeline(config)
transcript = pipeline.process("meeting.m4a")

for segment in transcript.segments:
    print(f"{segment.speaker}: {segment.text}")

Output Formats

Format Description
md Markdown with bold speaker names
txt Timestamped plain text
json Structured data with word-level timestamps
srt SubRip subtitles
vtt WebVTT subtitles

Options

voxscriber --help

  --speakers, -s    Number of speakers (if known)
  --language, -l    Force language (e.g., 'en', 'es')
  --model, -m       Whisper model (default: large-v3-turbo on GPU/MLX, small on CPU)
  --formats, -f     Output formats (default: md,txt)
  --output, -o      Output directory
  --device          auto (default), mps, cuda, or cpu
  --srt-mode        Subtitle segmentation mode for srt/vtt: speaker|sentence
  --srt-max-duration  Maximum subtitle duration in seconds for srt/vtt
  --quiet, -q       Suppress progress
  --print           Print transcript to console

Performance

~0.1-0.15x RTF on Apple Silicon (MLX). ~0.15-0.25x RTF on NVIDIA GPUs (faster-whisper). A 20-minute recording processes in ~2-4 minutes depending on hardware.

Troubleshooting

Run the diagnostic tool to check your setup:

voxscriber-doctor

This will check FFmpeg availability and HF_TOKEN, and offer to fix common issues automatically.

Other Issues

Issue Solution
requires Python >= 3.10 Use Python 3.10+: python3.10 -m venv .venv
Installed wrong package It's voxscriber (with 'r'), not voxscribe
HF_TOKEN required Run voxscriber-doctor to set up authentication

Support

If you find VoxScriber useful, consider supporting its development:

Buy Me A Coffee GitHub Sponsors

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voxscriber-0.2.6.tar.gz (869.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voxscriber-0.2.6-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file voxscriber-0.2.6.tar.gz.

File metadata

  • Download URL: voxscriber-0.2.6.tar.gz
  • Upload date:
  • Size: 869.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voxscriber-0.2.6.tar.gz
Algorithm Hash digest
SHA256 58d0cca7670736151b07e491b9a4b74564cf932215cc887437bd81974308b513
MD5 935c6d56eb2c024c79a947b08fb419b2
BLAKE2b-256 4f26b9ddc825a45e114e91f82b811a13abb416e75978741a9ac806f99d34df3d

See more details on using hashes here.

Provenance

The following attestation bundles were made for voxscriber-0.2.6.tar.gz:

Publisher: publish.yml on dparedesi/voxscriber

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file voxscriber-0.2.6-py3-none-any.whl.

File metadata

  • Download URL: voxscriber-0.2.6-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for voxscriber-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 ecc864e3350898fdc2e651935b243cf3d5d7849e0b16e5b946a38abdb1efa7de
MD5 5b6e384ebab53c3fe827a958e6cfcd04
BLAKE2b-256 8e0b0682dd4446afec0350fa701a49d1c6a31afcc82f104e44bb8212255fa7af

See more details on using hashes here.

Provenance

The following attestation bundles were made for voxscriber-0.2.6-py3-none-any.whl:

Publisher: publish.yml on dparedesi/voxscriber

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page