Skip to main content

Fx-Encoder++ for audio effects representation

Project description

Fx-Encoder++

Convert audio effects from your music into encoded representations suitable for audio effects processing and analysis tasks.

Paper HugginFace

About Fx-Encoder++

We adopt the codebase of CLAP for this project.

An audio effects representation learning based on SimCLR.

Architecture

fxencoder_plusplus

Usage

Installation

pip install fxencoder_plusplus

Usage

Notice: The input to Fx-Encoder++ should be stereo

Initialize Models

from fxencoder_plusplus import load_model 

# Load default base model (auto-downloads if needed)
DEVICE = 'cuda'
model = load_model(
    'default',
    device=DEVICE,
)

Extract audio effects representations from mixture tracks or stem tracks, where a single representation encodes the overall audio effects style of the entire input.

import torch 
import librosa 
audio_path = librosa.example('trumpet')
wav, sr = librosa.load(audio_path, sr=44100, mono=False)
wav = torch.from_numpy(wav).unsqueeze(0).unsqueeze(0).repeat(1, 2, 1).to(DEVICE) # [1, 2, seq_len]

fx_emb = model.get_fx_embedding(wav)
print(fx_emb.shape) # [1, embed_dim], [1, 128]

## if you want to get the embedding before projection, then 
fx_emb = model.get_fx_embedding(wav, normalized=False)
print(fx_emb.shape) # [1, embed_dim], [1, 2048]

Extract instrument-specific audio effects representations from mixture tracks. For example, extract the audio effects representation of just the vocals within a full mix.

  1. Audio Reference:
import torchaudio 
import julius 
mixture_path = "/path/to/mixture.wav"
mixture, sr = torchaudio.load(mixture_path, num_frames=441000)
mixture = mixture.unsqueeze(0).to(DEVICE) # [1, channel, seq_len]

query_path = "/path/to/inst.wav"
query, sr = torchaudio.load(query_path, frame_offset=441000, num_frames=441000)
query = query.unsqueeze(0).to(DEVICE) # [1, channel, seq_len]
query = julius.resample_frac(query, int(44100), int(48000))

_, fx_emb = model.get_fx_embedding_by_audio_query(mixture, query)
print(fx_emb.shape) # [1, embed_dim], [1, 128]
  1. Text Reference:
import torchaudio 
mixture_path = "/path/to/mixture.wav"
mixture, sr = torchaudio.load(mixture_path, num_frames=441000)
mixture = mixture.unsqueeze(0).to(DEVICE) # [1, channel, seq_len]

query = "the sound of vocals"

_, fx_emb = model.get_fx_embedding_by_text_query(mixture, query)
print(fx_emb.shape) # [1, embed_dim], [1, 128]

Training

Env

  1. Create environment with conda
conda create --name fxenc python=3.10.14
  1. Install
pip install -r requirements.txt 

Prepare Fx-Normalized Dataset

Because the dataset has copyright restriction, unfortunatly we cannot directly share preprocessed datasets.

  1. Download MUSDB, MoisesDB
  2. Please check FxNorm-automix for preparing audio effects normalized dataset

Run

bash scripts/train_proposed.sh 

Evaluation

We develop a retrieval-based evaluation pipeline (Using MUSDB dataset as the example)

  1. Check FxNorm-automix for preparing audio effects normalized dataset
  2. Synthesize evaluation dataset: check build_musdb.py
  3. Run retrieval-based evaluation: check eval_retrieval.py

LICENSE

This library is released under the CC BY-NC 4.0 license. Please refer to the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fxencoder_plusplus-0.1.5.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fxencoder_plusplus-0.1.5-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file fxencoder_plusplus-0.1.5.tar.gz.

File metadata

  • Download URL: fxencoder_plusplus-0.1.5.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for fxencoder_plusplus-0.1.5.tar.gz
Algorithm Hash digest
SHA256 3bbd6d703e28f17e554f2f5768b732915b941e95813d8896d70324ebce38afc1
MD5 bf9c7041c2fa183606203b3ba1c263d4
BLAKE2b-256 8106d7ab66141e90643bd2f84a0cc6a06b1707a525e1641c38a8d89277963494

See more details on using hashes here.

File details

Details for the file fxencoder_plusplus-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for fxencoder_plusplus-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 850bfb05dae7f88e0ac4d649266f5ec748df8fef911c64b3f2584a5784b37187
MD5 2aada0421cc7781c78c1d1b2302fd5db
BLAKE2b-256 3ea51c73a6643eca9dbc54e93e0a26129db876406a44cde5e6599c408bc344b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page