Skip to main content

Fx-Encoder++ for audio effects representation

Project description

Fx-Encoder++

Convert audio effects from your music into encoded representations suitable for audio effects processing and analysis tasks.

About Fx-Encoder++

We adopt the codebase of CLAP for this project.

An audio effects representation learning based on SimCLR.

Architecture

fxencoder_plusplus

Usage

Installation

pip install fxencoder_plusplus

Usage

Notice: The input to Fx-Encoder++ should be stereo

Initialize Models

from fxencoder_plusplus import load_model 

# Load default base model (auto-downloads if needed)
DEVICE = 'cuda'
model = load_model(
    'default',
    device=DEVICE,
)

Extract audio effects representations from mixture tracks or stem tracks, where a single representation encodes the overall audio effects style of the entire input.

import torch 
import librosa 
audio_path = librosa.example('trumpet')
wav, sr = librosa.load(audio_path, sr=44100, mono=False)
wav = torch.from_numpy(wav).unsqueeze(0).unsqueeze(0).repeat(1, 2, 1).to(DEVICE) # [1, 2, seq_len]

fx_emb = model.get_fx_embedding(wav)
print(fx_emb.shape) # [1, embed_dim], [1, 128]

Extract instrument-specific audio effects representations from mixture tracks. For example, extract the audio effects representation of just the vocals within a full mix.

  1. Audio Reference:
import torchaudio 
import julius 
mixture_path = "/path/to/mixture.wav"
mixture, sr = torchaudio.load(mixture_path, num_frames=441000)
mixture = mixture.unsqueeze(0).to(DEVICE) # [1, channel, seq_len]

query_path = "/path/to/inst.wav"
query, sr = torchaudio.load(query_path, frame_offset=441000, num_frames=441000)
query = query.unsqueeze(0).to(DEVICE) # [1, channel, seq_len]
query = julius.resample_frac(query, int(44100), int(48000))

_, fx_emb = model.get_fx_embedding_by_audio_query(mixture, query)
print(fx_emb.shape) # [1, embed_dim], [1, 128]
  1. Text Reference:
import torchaudio 
mixture_path = "/path/to/mixture.wav"
mixture, sr = torchaudio.load(mixture_path, num_frames=441000)
mixture = mixture.unsqueeze(0).to(DEVICE) # [1, channel, seq_len]

query = "the sound of vocals"

_, fx_emb = model.get_fx_embedding_by_text_query(mixture, query)
print(fx_emb.shape) # [1, embed_dim], [1, 128]

Training

Env

  1. Create environment with conda
conda create --name fxenc python=3.10.14
  1. Install
pip install -r requirements.txt 

Prepare Fx-Normalized Dataset

Because the dataset has copyright restriction, unfortunatly we cannot directly share preprocessed datasets.

  1. Download MUSDB, MoisesDB
  2. Please check FxNorm-automix for preparing audio effects normalized dataset

Run

bash scripts/train_proposed.sh 

Evaluation

We develop a retrieval-based evaluation pipeline (Using MUSDB dataset as the example)

  1. Check FxNorm-automix for preparing audio effects normalized dataset
  2. Synthesize evaluation dataset: check build_musdb.py
  3. Run retrieval-based evaluation: check eval_retrieval.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fxencoder_plusplus-0.1.4.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fxencoder_plusplus-0.1.4-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file fxencoder_plusplus-0.1.4.tar.gz.

File metadata

  • Download URL: fxencoder_plusplus-0.1.4.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for fxencoder_plusplus-0.1.4.tar.gz
Algorithm Hash digest
SHA256 9868543701e0e6e06ee3d80bd1f89450f75b63cdc75961d02d5aea17317f1671
MD5 40bb2b43a93e35538f63084d08b82059
BLAKE2b-256 3a862590cc50840d00243c266196a70101a4146b680b387be15f8d89ffa19209

See more details on using hashes here.

File details

Details for the file fxencoder_plusplus-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for fxencoder_plusplus-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 04e70cdc1a7f02994bcbc5fd132577c9d47c71da53beee14b61fefb3b3152696
MD5 eebccc2f6de43ab0796ac1fdd15ce026
BLAKE2b-256 401ff0eb699916d5b81718a7783a90fe5879e12c4492166f66c45f59be94c577

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page