Skip to main content

Hub for music machine learning generating audio

Project description

coopertunes

Hub for music machine learning generating audio

Amazing README coming soon!

Installation

It is recommended to use conda environment for setup coopertunes module. To install conda on your machine follow this instruction. If you already have conda installed create a virtual environment"

conda create -n coopertunes python=3.10

and activate it:

conda activate coopertunes

Clone coopertunes repository:

git clone git@github.com:Szakulli07/coopertunes.git cd coopertunes conda develop .

Before install coopertunes module you need to install pytorch framework. It is recommended to install version 2.0.1:

pip install torch==2.0.1 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

Now you need to install coopertunes module:

pip install -e .

Knowledge

There are some things worth acknowledging before starting to work with "coopertunes". We will list most relevant

Repo concepts

Generally we try to make clean structure which will be similar for all models.

  1. HParams - those are classes where all your hyper parameters should be stored. It is beneficiall to have it in one place because it allows you for easy tracking. Both in debugging you don't need to go through 4 classes before finding out from when some hparam is and in anaylising. You simply dump hparam to file and that's all what you need to describe your experiment for reproducibility.
  2. Models - files with models architecture. Try to have all thing connected with one model in one module or if possible in one file. Like in hugging face transformers. It also allows for easier debugging and make models less dependent. For example many models might have ResidualLayer but probably everywhere it would look different.
  3. Datasets - nothing fancy there. Just classes allowing working with data. Generally don't be afraid to make as many specific datasets as you need.
  4. Supervisors - clue of this repo. Those are classes that allows models training. Their task is to load data, train models, save checkpoints logg metrics ang generally everything that is needed for training model. Having such class separate from model itself make further inferencing model less clutered.
  5. Datatools - som functions and tools in general for managing datasets for given model. By using this, You should be able to download and preprocess dataset for given models.

Repo automation

The basis of the effective work is automation. For now there are not many but still try to use it and upgrade it.

  1. Code quality - pylint, pycodestyle, mypy. They all help you in making code prettier. What's more it allows for finding out errors before them happened (What makes python beutifull is dynamic typing. What makes python horrible is dynamic typing)
  2. Versioning - release-please is a pipe that create changelog based on your conventional commits. When you want to change repo version simply commit fix, feat or !.

In the future there should be pipes like automatic tests, building dokcer images, serving models and so on.

Libriaries

We take advantage of many python libs. Some of them are worth mentioning and spending some time learning.

  1. torch-summary - easiest from the list. It allows you for pretty logging your model. All its layers, parameters and others. It's nice get first insides.
  2. einops - if you do not like x.view(0, 2, 3, 1) this lib is for you. It allows for many nicer operations for example rearrange. You can change shape like x, 'b c t -> 1 b (c t). Simply you won't need all those comments you wrote near those pesky operations to keep in mind what is tensor shape at this moment.
  3. librosa - audio bible. Everything you need with audio (unless you want to propagate audio through it...). Simply all data preparations and visualizations is possible thx to it. It is also standard lib that most big research center uses.
  4. deepspeed - multi gpu training, multi node training, multiple precisions, ZeRO optimizations and many more. Generally when you start training models that can't be trained during 1.5h on google collab you will probably need it.
  5. pretty_midi - library designed for handling MIDI (Musical Instrument Digital Interface) files in a user-friendly and efficient manner. It allows us to easily manipulate MIDI data, including reading, writing, and visualizing MIDI files, as well as extracting musical information such as notes, instruments, and timing details.

Models

For now we included/are including 5 models. Some inside about them.

  • MelSpecVAE - we used moisehorta (the classic one) implementation as the baseline. Originally it was some autogenerated notebooks in tensorflow. Now it is biggest VAE ever existed. We make it torch. We add it some nice features like gradient cliping, gradient accumulation, lr scheduler... And it is still not working. It is just simple VAE that generate mel spectrograms from noise. It assumes that if something is near in latent space it will be near in mel spec space. And this assumption is just bad. Mayby you can use it to really easy techno but that's all.
  • MelSpecVQVAE - it is addition we propose to normal MelSpecVAE that utilize VQ. It is nice technique that says hey mayby world does not need to be as much contigues. Generally now it is common trend and its more complicated variations (RVQ and friends) are used in all SOTA networks like Soundstream or Encodec. For now it can be used like MelSpecVAE - generate mels from noise but we think it could be used in style transfer with more success.
  • MelGAN - very good vocoder. It is used to create raw audio from mel spectrograms. There are better options like HiFiGAN and many more complicated but MelGAN advantage is its simplicity. Many vocoders utilize GAN ideas so it is worth knowing this approach. To run succesfull training you will probably one about 10-60h and 2 weeks of GPU. Just remember you will quickly hear MelGAN producing "good" results but put on speakers and listen carefully because we are aiming for "very good"
  • PerformanceRNN - As said in original publicztion: ..."Performance RNN, an LSTM-based recurrent neural network designed to model polyphonic music with expressive timing and dynamics.". It is able to recreate sequences learned from midi files, and create guided midis itself. Main backbone of this network is Gated Recurent Unit.
  • PerformanceRNNattentive - As PerformanceRNN is generating note after note for each context, we utilized self attention module to all GRU's inputs. It allowed model to create music with more "piano keys pressed" at the same time, and overall better quality (in our opinion) in the same number of epochs as PerformanceRNN.
  • GANSynth - based on GANSynth which is PGGAN used for generating spectrograms from noise for music with ACGAN pitch conditioning. Author's implementation is part of Google Magenta which is developed using tensorflow. This project should be one of the first GANSynth implementation in Pytorch. GANSynth is trained on NSynth dataset. GANSynth is one of the biggest model in the project, because of computational limits we couldn't benchmark it.

Future works

In this section we would like to write some things about our opinions what could be done with this repo and how to work with it. It could be easly given to 6 people team or several smaller ones.

  1. Adding newer models. For now we add some rather old ones that are rather for showing purpose. Currently in era of VALL-E or AudioBox concentraining on one SOTA model (without open implementation) is also time consuming option.
  2. Refactor and standarization. Currently many things needs some touches. Adding deepseed to all models, standarizing checkpoitning, decuplication... those are things that could be done. I would also combine this one with serving repo on docker hub and on HuggingFace.
  3. Add more automated data processing for each model. In future data should be automatically downloaded and preprocessed, if directory pointed by hparams is empty.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coopertunes-0.2.0.tar.gz (35.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

coopertunes-0.2.0-py3-none-any.whl (35.8 MB view details)

Uploaded Python 3

File details

Details for the file coopertunes-0.2.0.tar.gz.

File metadata

  • Download URL: coopertunes-0.2.0.tar.gz
  • Upload date:
  • Size: 35.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for coopertunes-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b3219a931e13b957b711bbca73303909cb3a7ee097ebfcf70230bcbae9267260
MD5 634ee8f2dad288f44c66ac8a1e4802d0
BLAKE2b-256 55607915bfde1c371e085a3d4aca11e2e0e62dfaa3943455b08a8eccff9ffa7e

See more details on using hashes here.

File details

Details for the file coopertunes-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: coopertunes-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 35.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for coopertunes-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dfb77a166bc06a540460e24748dea0465971d234ebbdb89ca38a7dbadb5e71af
MD5 18f1fc3c39d18c1a93a5af317cefcfec
BLAKE2b-256 00e2f8ffe9203067577788d6902365e66a154f0be9716bca508c43a568c8f13a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page