Local speaker diarization using MLX Whisper (macOS) or faster-whisper (Linux/CUDA) and Pyannote
Project description
VoxScriber
Professional speaker diarization running 100% locally. Supports MLX Whisper on Apple Silicon and faster-whisper on Linux/CUDA, combined with Pyannote 3.1.
Requirements
- Python 3.10+
- Hugging Face token (free, one-time model download)
- For GPU: CUDA 12 + cuDNN 9 (optional, CPU works too)
That's it. No FFmpeg, no system packages, no sudo required.
Installation
pip install voxscriber
The right Whisper backend is installed automatically:
- macOS Apple Silicon: MLX Whisper
- Linux/other: faster-whisper (CUDA or CPU)
Setup Hugging Face Token
VoxScriber uses pyannote models which require a Hugging Face token.
Option 1: Interactive setup (recommended)
voxscriber-doctor
This will guide you through accepting the model terms and saving your token securely.
Option 2: Using huggingface-cli
# First, accept terms at https://huggingface.co/pyannote/speaker-diarization-3.1
huggingface-cli login
Your token will be saved to ~/.cache/huggingface/token and used automatically.
Option 3: Environment variable
export HF_TOKEN=your_token_here
Usage
# Basic
voxscriber meeting.m4a
# With known speaker count
voxscriber meeting.m4a --speakers 2
# All formats
voxscriber meeting.m4a --formats md,txt,json,srt,vtt
# Sentence-level subtitle segmentation for editing workflows
voxscriber meeting.m4a --formats srt,vtt --srt-mode sentence --srt-max-duration 15
# Print to console
voxscriber meeting.m4a --print
Python API
from voxscriber import DiarizationPipeline, PipelineConfig
config = PipelineConfig(
num_speakers=2,
language="en",
)
pipeline = DiarizationPipeline(config)
transcript = pipeline.process("meeting.m4a")
for segment in transcript.segments:
print(f"{segment.speaker}: {segment.text}")
Output Formats
| Format | Description |
|---|---|
md |
Markdown with bold speaker names |
txt |
Timestamped plain text |
json |
Structured data with word-level timestamps |
srt |
SubRip subtitles |
vtt |
WebVTT subtitles |
Options
voxscriber --help
--speakers, -s Number of speakers (if known)
--language, -l Force language (e.g., 'en', 'es')
--model, -m Whisper model (default: large-v3-turbo on GPU/MLX, small on CPU)
--formats, -f Output formats (default: md,txt)
--output, -o Output directory
--device auto (default), mps, cuda, or cpu
--srt-mode Subtitle segmentation mode for srt/vtt: speaker|sentence
--srt-max-duration Maximum subtitle duration in seconds for srt/vtt
--quiet, -q Suppress progress
--print Print transcript to console
Performance
~0.1-0.15x RTF on Apple Silicon (MLX). ~0.15-0.25x RTF on NVIDIA GPUs (faster-whisper). A 20-minute recording processes in ~2-4 minutes depending on hardware.
Troubleshooting
Run the diagnostic tool to check your setup:
voxscriber-doctor
This will check FFmpeg availability and HF_TOKEN, and offer to fix common issues automatically.
Other Issues
| Issue | Solution |
|---|---|
requires Python >= 3.10 |
Use Python 3.10+: python3.10 -m venv .venv |
| Installed wrong package | It's voxscriber (with 'r'), not voxscribe |
HF_TOKEN required |
Run voxscriber-doctor to set up authentication |
Support
If you find VoxScriber useful, consider supporting its development:
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voxscriber-0.2.6.tar.gz.
File metadata
- Download URL: voxscriber-0.2.6.tar.gz
- Upload date:
- Size: 869.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58d0cca7670736151b07e491b9a4b74564cf932215cc887437bd81974308b513
|
|
| MD5 |
935c6d56eb2c024c79a947b08fb419b2
|
|
| BLAKE2b-256 |
4f26b9ddc825a45e114e91f82b811a13abb416e75978741a9ac806f99d34df3d
|
Provenance
The following attestation bundles were made for voxscriber-0.2.6.tar.gz:
Publisher:
publish.yml on dparedesi/voxscriber
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
voxscriber-0.2.6.tar.gz -
Subject digest:
58d0cca7670736151b07e491b9a4b74564cf932215cc887437bd81974308b513 - Sigstore transparency entry: 1012601761
- Sigstore integration time:
-
Permalink:
dparedesi/voxscriber@17007c8e215f9768b7fb6106649eaf0db938f0b7 -
Branch / Tag:
refs/tags/v0.2.6 - Owner: https://github.com/dparedesi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@17007c8e215f9768b7fb6106649eaf0db938f0b7 -
Trigger Event:
release
-
Statement type:
File details
Details for the file voxscriber-0.2.6-py3-none-any.whl.
File metadata
- Download URL: voxscriber-0.2.6-py3-none-any.whl
- Upload date:
- Size: 29.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ecc864e3350898fdc2e651935b243cf3d5d7849e0b16e5b946a38abdb1efa7de
|
|
| MD5 |
5b6e384ebab53c3fe827a958e6cfcd04
|
|
| BLAKE2b-256 |
8e0b0682dd4446afec0350fa701a49d1c6a31afcc82f104e44bb8212255fa7af
|
Provenance
The following attestation bundles were made for voxscriber-0.2.6-py3-none-any.whl:
Publisher:
publish.yml on dparedesi/voxscriber
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
voxscriber-0.2.6-py3-none-any.whl -
Subject digest:
ecc864e3350898fdc2e651935b243cf3d5d7849e0b16e5b946a38abdb1efa7de - Sigstore transparency entry: 1012601823
- Sigstore integration time:
-
Permalink:
dparedesi/voxscriber@17007c8e215f9768b7fb6106649eaf0db938f0b7 -
Branch / Tag:
refs/tags/v0.2.6 - Owner: https://github.com/dparedesi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@17007c8e215f9768b7fb6106649eaf0db938f0b7 -
Trigger Event:
release
-
Statement type: