Audio denoising and transcription pipeline using Demucs, Silero VAD, and Whisper
Project description
dinscribe audio transcription
Processes audio through a three-step pipeline to produce a transcription JSON: denoise, voice activity detection, transcribe.
Setup
python -m venv venv
source venv/Scripts/activate
pip install -r requirements.txt
Run the full pipeline
python main.py input/audio.mp3 # Run for a single file
python main.py input/ # Run for all audio files in a folder
python main.py input/audio.mp3 -f # Force re-run all steps
Each step checks whether its output already exists and skips it if so. Use -f to force all steps to re-run regardless, -o <output_dir> to specify a different output directory, and -c <config.yaml> to specify a different config file.
Output is written to output/<filename>/ and contains:
<filename>_denoised.wav(vocals isolated from background noise)<filename>_vad.json(detected speech segment boundaries)<filename>_transcription.json(final transcription with timestamps)
Configuration
Edit config.yaml to adjust settings for each step. Some important options are:
denoise.model- Demucs model for vocal isolation (default:htdemucs)vad.threshold- VAD speech detection sensitivity (default:0.5)transcribe.model- Whisper model sizetinythroughlarge(default:base)transcribe.language- Transcription language code (default:en)
Other tips for best results
Add domain-specific vocabulary to vocab.txt to improve transcription accuracy on unusual words and jargon. For noisy or technical audio, set temperature: 0 to disable attempts to fallback to higher-temperature decoding, and consider filtering out any common hallucinations specific to your dataset.
Run individual steps
Each step can also be run alone:
python denoise.py audio.mp3
python vad.py audio_denoised.wav
python transcribe.py audio_denoised.wav audio_vad.json
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dinscribe-0.1.1.tar.gz.
File metadata
- Download URL: dinscribe-0.1.1.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8706c48d716649e041833f7612a1c2012761fee0fd143fa638a1fc4901e65e19
|
|
| MD5 |
9f8112f85440f0b2846f8d063619630f
|
|
| BLAKE2b-256 |
e0c73709dfa904ab3f57c69f46c44ff06f96814558380fda5351018d6223c446
|
File details
Details for the file dinscribe-0.1.1-py3-none-any.whl.
File metadata
- Download URL: dinscribe-0.1.1-py3-none-any.whl
- Upload date:
- Size: 12.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
011701f9d6823b3bded9a86a9cfc08ceb15a5a51d4f5b48a5d1d26c363dfd750
|
|
| MD5 |
51ef4f843e52773331ccd13d44d5d18d
|
|
| BLAKE2b-256 |
aec3485f75efc5dd5ca5e172944eb199a7dc7e9ba8f8ad6c2ad64c8b10e0fd9f
|