Skip to main content

Audio denoising and transcription pipeline using Demucs, Silero VAD, and Whisper

Project description

dinscribe audio transcription

Processes audio through a three-step pipeline to produce a transcription JSON: denoise, voice activity detection, transcribe.

Setup

python -m venv venv
source venv/Scripts/activate
pip install -r requirements.txt

Run the full pipeline

python main.py input/audio.mp3          # Run for a single file
python main.py input/                   # Run for all audio files in a folder
python main.py input/audio.mp3 -f       # Force re-run all steps

Each step checks whether its output already exists and skips it if so. Use -f to force all steps to re-run regardless, -o <output_dir> to specify a different output directory, and -c <config.yaml> to specify a different config file.

Output is written to output/<filename>/ and contains:

  • <filename>_denoised.wav (vocals isolated from background noise)
  • <filename>_vad.json (detected speech segment boundaries)
  • <filename>_transcription.json (final transcription with timestamps)

Configuration

Edit config.yaml to adjust settings for each step. Some important options are:

  • denoise.model - Demucs model for vocal isolation (default: htdemucs)
  • vad.threshold - VAD speech detection sensitivity (default: 0.5)
  • transcribe.model - Whisper model size tiny through large (default: base)
  • transcribe.language - Transcription language code (default: en)

Other tips for best results

Add domain-specific vocabulary to vocab.txt to improve transcription accuracy on unusual words and jargon. For noisy or technical audio, set temperature: 0 to disable attempts to fallback to higher-temperature decoding, and consider filtering out any common hallucinations specific to your dataset.

Run individual steps

Each step can also be run alone:

python denoise.py audio.mp3
python vad.py audio_denoised.wav
python transcribe.py audio_denoised.wav audio_vad.json

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dinscribe-0.1.1.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dinscribe-0.1.1-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file dinscribe-0.1.1.tar.gz.

File metadata

  • Download URL: dinscribe-0.1.1.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for dinscribe-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8706c48d716649e041833f7612a1c2012761fee0fd143fa638a1fc4901e65e19
MD5 9f8112f85440f0b2846f8d063619630f
BLAKE2b-256 e0c73709dfa904ab3f57c69f46c44ff06f96814558380fda5351018d6223c446

See more details on using hashes here.

File details

Details for the file dinscribe-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: dinscribe-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for dinscribe-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 011701f9d6823b3bded9a86a9cfc08ceb15a5a51d4f5b48a5d1d26c363dfd750
MD5 51ef4f843e52773331ccd13d44d5d18d
BLAKE2b-256 aec3485f75efc5dd5ca5e172944eb199a7dc7e9ba8f8ad6c2ad64c8b10e0fd9f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page