PyTorch-based audio source separation toolkit
Project description
Asteroid : Audio Source Separation on steroids
:construction: :warning: Under development :warning: :construction:
Asteroid is a Pytorch-based source separation and speech enhancement API
that enables fast experimentation on common datasets.
It comes with a source code written to support a large range of architectures
and a set of recipes to reproduce some papers.
Asteroid is intended to be a community-based project
so hop on and help us !
Guiding principles
- User friendliness. Asteroid's API offers simple solutions for most common use cases.
- Modularity. Building blocks are thought and designed to be seamlessly plugged together. Filterbanks, encoders, maskers, decoders and losses are all common building blocks that can be combined in a flexible way to create new systems.
- Extensibility. Extending Asteroid with new features is simple. Add a new filterbank, separator, architecture, dataset or even recipe very easily.
- Reproducibility. Recipes provide an easy way to reproduce results with data preparation, training and evaluation in a same script.
Highlights
Installation
In order to install Asteroid, clone the repo and install it using pip or python :
git clone https://github.com/mpariente/AsSteroid
cd AsSteroid
# Install with pip (in editable mode)
pip install -e .
# Install with python
python setup.py install
Running a recipe
cd egs/wham/ConvTasNet
./run.sh
More information in egs/README.md.
Recipes
- ConvTasnet (Luo et al.)
- Tasnet (Luo et al.)
- Deep clustering (Hershey et al. and Isik et al.)
- Chimera ++ (for ) (Luo et al. and Wang et al.)
- FurcaNeXt (Shi et al.)
- DualPathRNN (Luo et al.)
- Two step learning (Tzinis et al.)
Writing your own recipe
Contributing
See our contributing guidelines.
Codebase structure
├── asteroid # Python package / Source code
│ ├── data # Data classes, DalatLoaders maker.
│ ├── engine # Training classes : losses, optimizers and trainer.
│ ├── filterbanks # Common filterbanks and related classes.
│ ├── masknn # Separation building blocks and architectures.
│ └── utils.py
├── examples # Simple asteroid examples
└── egs # Recipes for all datasets and systems.
│ ├── wham # Recipes for one dataset (WHAM)
│ │ ├── ConvTasNet # ConvTasnet systme on the WHAM dataset.
│ │ │ └── ... # Recipe's structure. See egs/README.md for more info
│ │ ├── Your recipe # More recipes on the same dataset (Including yours)
│ │ ├── ...
│ │ └── DualPathRNN
│ └── Your dataset # More datasets (Including yours)
Building the docs
To build the docs, you'll need Sphinx, a theme and some other package
# Start by installing the required packages
cd docs/
pip install -r requirements.txt
# Build the docs
make html
# View it ! (Change firefox by your favorite browser)
firefox build/html/index.html
If you rebuild the docs, don't forget to run make clean
before it.
You can add this to your .bashrc
, source it and run run_docs
for the docs/
folder
alias run_docs='make clean; make html; firefox build/html/index.html'
Why Asteroid ?
Audio source separation and speech enhancement are fast evolving fields with a growing number of papers submitted to conferences each year. While datasets such as wsj0-{2, 3}mix, WHAM or MS-SNSD are being shared, there has been little effort to create common codebases for development and evaluation of source separation and speech enhancement algorithms. Here is one !
Remote TensorBoard visualization
# Launch tensorboard remotely (default port is 6006)
tensorboard --logdir exp/tmp/lightning_logs/ --port tf_port
# Open port-forwarding connection. Add -Nf option not to open remote.
ssh -L local_port:localhost:tf_port user@ip
Then open http://localhost:local_port/
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.