Audio denoising, VAD, speaker diarization, and transcription pipeline using Demucs, Silero VAD, pyannote, and Whisper
Project description
dinnote audio transcription
Processes audio through a four-step pipeline to produce a transcription JSON with per-speaker diarization: denoising (Demucs), voice activity detection (Silero VAD), speaker diarization (pyannote), and transcription (Whisper).
Installation
pip install dinnote
On first run, dinnote copies default config files to your platform config directory:
- Windows:
%APPDATA%\dinnote\ - macOS:
~/Library/Application Support/dinnote/ - Linux:
~/.config/dinnote/
Edit config.yaml and vocab.txt to customize settings.
Speaker diarization requires a HuggingFace token with access to pyannote/speaker-diarization-3.1.
Set it via diarize.hf_token in config.yaml.
CLI usage
dinnote input/audio.mp3 # single file
dinnote input/ # all audio files in a folder
dinnote input/audio.mp3 -f # force re-run all steps
dinnote input/audio.mp3 -c path/to/config.yaml # custom config
dinnote input/audio.mp3 -o results/ # custom output dir
Each step checks whether its output already exists and skips it if so. Use -f to force all steps to re-run.
Output is written to output/<filename>/ and contains:
<filename>_denoised.wav(vocals isolated from background noise)<filename>_vad.json(detected speech segment boundaries)<filename>_diarization.json(per-speaker turn boundaries from pyannote)<filename>_transcription.json(final transcription with timestamps and speaker labels)
Python API
from pathlib import Path
import dinnote
from dinnote import PipelineConfig, VadConfig, DiarizeConfig, TranscribeConfig
# Run the full pipeline with defaults
dinnote.process_file(
input_path=Path("recording.wav"),
output_dir=Path("output"),
)
# Custom config
config = PipelineConfig(
vad=VadConfig(threshold=0.4, max_segment_length_sec=20),
diarize=DiarizeConfig(num_speakers=2),
transcribe=TranscribeConfig(model="small", language="en"),
)
dinnote.process_file(Path("recording.wav"), Path("output"), config=config)
# Or use individual stages
from dinnote import denoise, vad, diarize, transcribe
denoised = denoise.run(Path("recording.wav"), Path("output/recording"), config={})
vad_file = vad.run(denoised, Path("output/recording"), config={})
diarization = diarize.run(denoised, Path("output/recording"), config={})
result = transcribe.run(denoised, Path("output/recording"), config={}, diarization_path=diarization)
Configuration
denoise:
model: htdemucs # htdemucs | htdemucs_ft | mdx | mdx_extra | htdemucs_6s
vad:
threshold: 0.5 # 0.0–1.0, higher = requires clearer speech
min_speech_duration_ms: 250
min_silence_duration_ms: 100
padding_ms: 500
max_segment_length_sec: 30
merge_within_sec: 1.0
diarize:
# hf_token: hf_...
num_speakers: null # fix speaker count or leave null to let pyannote estimate
min_speakers: null
max_speakers: null
min_turn_ms: 200 # turns shorter than this are discarded (ms)
transcribe:
model: base # tiny | base | small | medium | large
language: en # set to null to auto-detect
temperature: null # null = Whisper fallback sequence, 0 = greedy
no_speech_threshold: 0.6
logprob_threshold: -1.0
compression_ratio_threshold: 2.4
condition_on_previous_text: false
vocab_file: null # path to domain-specific vocabulary, defaults to vocab.txt in config dir
Add domain-specific vocabulary to vocab.txt to improve transcription accuracy on unusual words and jargon. For noisy or technical audio, set temperature: 0 to disable Whisper's fallback to higher-temperature decoding, and consider filtering out common hallucinations specific to your dataset.
If num_speakers is known in advance, setting it gives more reliable diarization. Otherwise use min_speakers/max_speakers to constrain the range, or leave both null to let pyannote estimate freely.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dinnote-0.1.0.tar.gz.
File metadata
- Download URL: dinnote-0.1.0.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8bbf269f58dc3f6412986c94a3b3558e247a34977f4930a050254fbe1f5bf44
|
|
| MD5 |
c58500bae2257d8b62f792cde1b85aa7
|
|
| BLAKE2b-256 |
b1cadb1d6a7db58834e3285275133734ee8bd841d365d5cfa16262a8a0e5513f
|
File details
Details for the file dinnote-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dinnote-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43d292eeff2ff16911dbad5e8767cfd69b398939b093f0f24dd22792a03651a0
|
|
| MD5 |
d021948a16e6ca793af939f927cccbc4
|
|
| BLAKE2b-256 |
890ef7925c5e0f62f8594d5dc3151a12c1c20eb2064d5748d93dd458fe6ee0fe
|