Skip to main content

Audio denoising and transcription pipeline using Demucs, Silero VAD, and Whisper

Project description

dinscribe audio transcription

Processes audio through a three-step pipeline to produce a transcription JSON: denoising (demucs), voice activity detection (Silero VAD), and transcription (Whisper).

Installation

pip install dinscribe

This installs CPU-only torch by default. For GPU acceleration, install the CUDA build first:

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install dinscribe

dinscribe uses CUDA automatically when available and warns at startup if it is not.

On first run, dinscribe copies default config files to your platform config directory:

  • Windows: %APPDATA%\dinscribe\
  • macOS: ~/Library/Application Support/dinscribe/
  • Linux: ~/.config/dinscribe/

Edit config.yaml and vocab.txt to customize settings.

CLI usage

dinscribe input/audio.mp3          # single file
dinscribe input/                   # all audio files in a folder
dinscribe input/audio.mp3 -f       # force re-run all steps
dinscribe input/audio.mp3 -c path/to/config.yaml   # custom config
dinscribe input/audio.mp3 -o results/              # custom output dir

Each step checks whether its output already exists and skips it if so. Use -f to force all steps to re-run.

Output is written to output/<filename>/ and contains:

  • <filename>_denoised.wav (vocals isolated from background noise)
  • <filename>_vad.json (detected speech segment boundaries)
  • <filename>_transcription.json (final transcription with timestamps)

Python API

from pathlib import Path
import dinscribe
from dinscribe import PipelineConfig, VadConfig, TranscribeConfig

# Run the full pipeline with defaults
dinscribe.process_file(
    input_path=Path("recording.wav"),
    output_dir=Path("output"),
)

# Custom config
config = PipelineConfig(
    vad=VadConfig(threshold=0.4, max_segment_length_sec=20),
    transcribe=TranscribeConfig(model="small", language="en"),
)
dinscribe.process_file(Path("recording.wav"), Path("output"), config=config)

# Or use individual stages
from dinscribe import denoise, vad, transcribe

denoised = denoise.run(Path("recording.wav"), Path("output/recording"))
vad_file  = vad.run(denoised, Path("output/recording"))
result    = transcribe.run(denoised, vad_file, Path("output/recording"))

Configuration

denoise:
  model: htdemucs        # htdemucs | htdemucs_ft | mdx | mdx_extra | htdemucs_6s

vad:
  threshold: 0.5         # 0.0–1.0, higher = requires clearer speech
  min_speech_duration_ms: 250
  min_silence_duration_ms: 100
  padding_ms: 500
  max_segment_length_sec: 30
  merge_within_sec: 1.0

transcribe:
  model: base            # tiny | base | small | medium | large
  language: en           # set to null to auto-detect
  temperature: null      # null = Whisper fallback sequence, 0 = greedy
  no_speech_threshold: 0.6
  logprob_threshold: -1.0
  compression_ratio_threshold: 2.4
  condition_on_previous_text: false
  vocab_file: null       # path to domain-specific vocabulary, defaults to vocab.txt in config dir

Add domain-specific vocabulary to vocab.txt to improve transcription accuracy on unusual words and jargon. For noisy or technical audio, set temperature: 0 to disable attempts to fallback to higher-temperature decoding, and consider filtering out any common hallucinations specific to your dataset.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dinscribe-0.1.2.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dinscribe-0.1.2-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file dinscribe-0.1.2.tar.gz.

File metadata

  • Download URL: dinscribe-0.1.2.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for dinscribe-0.1.2.tar.gz
Algorithm Hash digest
SHA256 60556b066e1f2562ffb9777e0ecc0ff2dfa3cbbcc6d8b7240e034bd368d8b2b2
MD5 145afbdbde682ee31a60e8934c564073
BLAKE2b-256 49046588f1afe6a3b84a9e993f70ffeec7f0c7326a98ba21db390a1e9d6ea2c8

See more details on using hashes here.

File details

Details for the file dinscribe-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: dinscribe-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for dinscribe-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9b484d628d6ac1799863049591bdea2c4e5761c2e4b2ec1f5adf227479cbd214
MD5 f06cea6d0b20980622e3ea432766bd4a
BLAKE2b-256 36e97644aeb4a9549ea4f731e54356b465103fea1523c540981b9527f8a411f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page