Skip to main content

Animal vocalization denoising

Project description

BIODENOISING: Animal vocalization denoising

Here we provide the inference and training code. If you solely plan to do inference go to the following github repo

Check the biodenoising web page for demos and more info.

The proposed model is based on the Demucs architecture, originally proposed for music source-separation and real-time audio enhancement.

We publish the pre-print on arXiv.

Quick start

  • Install (from PyPI)
pip install biodenoising
  • Install (from source, editable)
git clone https://github.com/earthspecies/biodenoising
cd biodenoising
pip install -r requirements.txt
pip install -e .
  • Denoise a folder (writes enhanced WAVs)
biodenoise \
  --method biodenoising16k_dns48 \
  --noisy_dir /path/to/noisy_audio \
  --out_dir   /path/to/output_dir \
  --device cuda
  • Adapt the model to your domain/dataset (multi-step fine-tuning)
biodenoise-adapt \
  --method biodenoising16k_dns48 \
  --noisy_dir /path/to/noisy_audio \
  --out_dir   /path/to/output_dir \
  --steps 3 \
  --epochs 10 \
  --device cuda

Notes:

  • --noisy_dir: directory with your input audio files
  • --out_dir: destination directory for outputs
  • --steps / --epochs (adapt): control adaptation passes and training epochs per step
  • --keep_original_sr: keep the original audio sample rate instead of resampling to the model rate (for high frequency vocalizations e.g. bats, belugas)
  • --selection_table: enable event-based masking using selection tables (csv/tsv/txt) next to audio files

New features

  • Domain adaptation with adapt.py: Fine-tune the pretrained biodenoising16k_dns48 model on your own recordings using pseudo-clean targets generated from your data. This multi-step procedure (configure with --steps and --epochs) adapts the model to your target domain/dataset and can improve performance when the target acoustics differ from the original training data.

  • Event-aware processing with --selection_table: When annotations (selection tables) are available next to your audio files, enabling --selection_table will restrict processing to annotated events. This can:

    • Improve denoising quality by removing the background outside the vocalizations.
    • Improve adaptation quality by using event-restricted targets and extracting the noise between events.

Colab and Notebooks

Notebooks in scripts/:

Installation

First, install Python >= 3.8 (recommended with miniconda).

Through pip (you just want to use pre-trained model out of the box)

Just run

pip install biodenoising

Development

Clone this repository and install the dependencies. We recommend using a fresh virtualenv or Conda environment.

git clone https://github.com/earthspecies/biodenoising
cd biodenoising
pip install -r requirements.txt  

Usage

Once the package is installed generating the denoised files can be done by:

biodenoise \
  --method biodenoising16k_dns48 \
  --noisy_dir <path to the dir with the noisy files> \
  --out_dir   <path to store enhanced files>

Notes:

  • You can either provide --noisy_dir (directory) or extend the tool to accept JSONs as in legacy flows.
  • The path given to --model_path (when overriding the pretrained) should point to a best.th file, not checkpoint.th.
  • Use --selection_table to restrict processing to annotated events; use --keep_original_sr to keep the input sampling rate. For more details regarding possible arguments, see the CLI help:
biodenoise --help

Training

Training is done in three steps: First we need to obtain the pseudo-clean training data:

python generate_training.py --out_dir /home/$USER/data/biodenoising16k/ --noisy_dir /home/$USER/data/biodenoising16k/dev/noisy/ --rir_dir /home/$USER/data/biodenoising16k/rir/ --method biodenoising16k_dns48 --transform none --device cuda

Then we need to prepare the csv files needed for training:

python prepare_experiments.py --data_dir /home/$USER/data/biodenoising16k/ --transform none --method biodenoising16k_dns48

Then we can train the model:

python train.py dset=biodenoising16k_biodenoising16k_dns48_none_step0 seed=0

Domain Adaptation

Biodenoising is a generic tool that may fail in some cases. In order to improve the performance of the model in a specific domain, we can leverage domain adaptation. The adaptation process involves multiple steps of training on pseudo-clean targets to fine-tune the model for your specific audio domain.

Basic Usage

python adapt.py --method biodenoising16k_dns48 --noisy_dir /path/to/noisy/audio/ --out_dir /path/to/output/directory/ --steps 3 --epochs 10

Advanced Options

The adaptation script supports numerous parameters to fine-tune the adaptation process:

usage: python adapt.py [-h] [--steps STEPS] [--noisy_dir NOISY_DIR] [--noise_dir NOISE_DIR]
                      [--test_dir TEST_DIR] [--out_dir OUT_DIR] [--noisy_estimate]
                      [--cfg CONFIG] [--epochs EPOCHS] [-v] [--method {biodenoising16k_dns48}] 
                      [--segment SEGMENT] [--highpass HIGHPASS] [--peak_height PEAK_HEIGHT]
                      [--transform {none,time_scale}] [--revecho REVECHO]
                      [--use_top USE_TOP] [--num_valid NUM_VALID] [--antialiasing]
                      [--force_sample_rate FORCE_SAMPLE_RATE]
                      [--time_scale_factor TIME_SCALE_FACTOR] [--noise_reduce]
                      [--amp_scale] [--interactive] [--window_size WINDOW_SIZE]
                      [--device DEVICE] [--dry DRY] [--num_workers NUM_WORKERS]
                      [--annotations] [--annotations_begin_column ANNOTATIONS_BEGIN_COLUMN]
                      [--annotations_end_column ANNOTATIONS_END_COLUMN]
                      [--annotations_label_column ANNOTATIONS_LABEL_COLUMN]
                      [--annotations_label_value ANNOTATIONS_LABEL_VALUE]
                      [--annotations_extension ANNOTATIONS_EXTENSION]
                      [--processed_dir PROCESSED_DIR] [--selection_table] [--keep_original_sr]

Adaptation parameters:
  --steps STEPS          Number of steps to use for adaptation (default: 5)
  --epochs EPOCHS        Number of epochs per step (default: 5)
  --noisy_dir NOISY_DIR  Path to the directory with noisy wav files
  --noise_dir NOISE_DIR  Path to the directory with noise wav files
  --test_dir TEST_DIR    For evaluation: path to directory containing clean.json and noise.json files
  --out_dir OUT_DIR      Directory for enhanced wav files (default: "enhanced")
  --noisy_estimate       Compute noise as the difference between noisy and estimated signal
  --processed_dir PROCESSED_DIR
                        Directory for storing preprocessed audio segments
  
Model parameters:
  --method {biodenoising16k_dns48}
                        Method to use for denoising (default: "biodenoising16k_dns48")
  --device DEVICE        Device to use (default: "cuda")
  --dry DRY              Dry/wet knob coefficient. 0 is only denoised, 1 only input signal (default: 0)

Audio processing:
  --segment SEGMENT      Minimum segment size in seconds (default: 4)
  --highpass HIGHPASS    Apply a highpass filter with this cutoff before separating (default: 20)
  --peak_height PEAK_HEIGHT
                        Filter segments with rms lower than this value (default: 0.008)
  --transform {none,time_scale}
                        Transform input by pitch shifting or time scaling (default: "none")
  --revecho REVECHO      Revecho probability (default: 0)
  --antialiasing         Use an antialiasing filter when using time scaling (default: False)
  --force_sample_rate FORCE_SAMPLE_RATE
                        Force the model to take samples of this sample rate
  --time_scale_factor TIME_SCALE_FACTOR
                        If model has different sample rate, play audio slower/faster with this factor before resampling to the model sample rate
  --noise_reduce         Use noisereduce preprocessing
  --amp_scale            Scale to the amplitude of the input
  --window_size WINDOW_SIZE
                        Size of the window for continuous processing (default: 0)
  --selection_table      Enable event masking via selection tables (csv/tsv/txt) located next to audio files
  --keep_original_sr     Keep the original sample rate instead of resampling to model's sample rate

Annotation options:
  --annotations          Use annotation files to extract segments from audio files (default: False)
  --annotations_begin_column ANNOTATIONS_BEGIN_COLUMN
                        Column name for segment start time in annotation files (default: "Begin")
  --annotations_end_column ANNOTATIONS_END_COLUMN
                        Column name for segment end time in annotation files (default: "End")
  --annotations_label_column ANNOTATIONS_LABEL_COLUMN
                        Column name for segment label in annotation files (default: None)
  --annotations_label_value ANNOTATIONS_LABEL_VALUE
                        Filter annotations by this label value (default: None)
  --annotations_extension ANNOTATIONS_EXTENSION
                        Extension of annotation files (default: ".csv")

Training options:
  --use_top USE_TOP      Use the top ratio of files for training, sorted by rms (default: 1.0)
  --num_valid NUM_VALID  Number of files to use for validation (default: 0)
  --interactive          Pause at each step to allow deleting files and continue
  --num_workers NUM_WORKERS
                        Number of workers (default: 5)

Configuration:
  --cfg CONFIG           Path to YAML configuration file (default: "biodenoising/conf/config_adapt.yaml")
  -v, --verbose          Enable verbose logging

The option --interactive allows for a manual inspection of the generated files and deletion of files for which the model is not performing well i.e. active learning.

Example Workflow

  1. Collect domain-specific noisy audio: Gather audio samples from your target domain
  2. Run adaptation:
    python adapt.py --method biodenoising16k_dns48 --noisy_dir /path/to/domain/audio/ --out_dir ./adapted_model/ --steps 3 --segment 2 --highpass 100
    
  3. Use your adapted model: The adaptation process creates a fine-tuned model in the output directory

Tips for Effective Adaptation

  • Use at least 5-10 minutes of audio from your target domain
  • For wildlife recordings with specific frequency ranges, adjust the --highpass parameter
  • If your recordings have specific noise characteristics, consider providing examples in --noise_dir
  • The adaptation process works best with audio that has a good signal-to-noise ratio
  • Use --interactive mode to inspect and manually filter generated files during adaptation

Using Annotations for Targeted Adaptation

The adaptation process supports using annotation files to extract specific segments:

python adapt.py --method biodenoising16k_dns48 --noisy_dir /path/to/audio/ --out_dir ./adapted_model/ --annotations --annotations_label_column "Call_Type" --annotations_label_value "Whistle"

This allows you to target adaptation to specific vocalizations or sound events in your recordings.

Using Selection Tables for Event-Based Processing

Both the denoising and adaptation processes support using selection tables for event-based processing:

# For denoising with selection tables
python -m biodenoising.denoiser.denoise --input /path/to/audio/ --output /path/to/output/ --selection_table

# For adaptation with selection tables
python adapt.py --method biodenoising16k_dns48 --noisy_dir /path/to/audio/ --out_dir ./adapted_model/ --selection_table

Selection tables are CSV, TSV, or TXT files located next to your audio files with the same base name. They should contain columns with start and end times in seconds. The system automatically detects columns with names like 'start', 'beginning', 'begin time', 'begin' for start times and 'end', 'end time' for end times.

When --selection_table is enabled:

  • Only audio within the specified event intervals is processed for denoising
  • Noise extraction focuses on gaps between events (with 0.2s buffer before and 0.4s after each event)
  • The final output is masked to preserve only the denoised events

Citation

If you use the code in your research, then please cite it as:

@misc{miron2024biodenoisinganimalvocalizationdenoising,
      title={Biodenoising: animal vocalization denoising without access to clean data}, 
      author={Marius Miron and Sara Keen and Jen-Yu Liu and Benjamin Hoffman and Masato Hagiwara and Olivier Pietquin and Felix Effenberger and Maddie Cusimano},
      year={2024},
      eprint={2410.03427},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2410.03427}, 
}

License

This model is released under the CC-BY-NC 4.0. license as found in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biodenoising-0.3.1.tar.gz (131.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

biodenoising-0.3.1-py3-none-any.whl (124.6 kB view details)

Uploaded Python 3

File details

Details for the file biodenoising-0.3.1.tar.gz.

File metadata

  • Download URL: biodenoising-0.3.1.tar.gz
  • Upload date:
  • Size: 131.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for biodenoising-0.3.1.tar.gz
Algorithm Hash digest
SHA256 253d6507d02fe55ee7c37391b9cd8d7fd6fa4bf25723922ae1b9e45d1306970c
MD5 aa2136db9346ff4077d57caeffd023a8
BLAKE2b-256 04d7a579bf1c89cfbb7580a240e3d43b2f166576ea7d81fb87ac84b39ec46c3d

See more details on using hashes here.

File details

Details for the file biodenoising-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: biodenoising-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 124.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for biodenoising-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 71707977bb9c2814437d7fe860c6c3f16a3f8f5f45cdf64334305978894b35d1
MD5 e29c350dded4b06a43bb0e9eca686d64
BLAKE2b-256 ee4d0f714f45095e494d84259d8f8f1ae0e2c04cb0dba68bc664275f06cdf305

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page