Fx-Encoder++ for audio effects representation
Project description
Fx-Encoder++
Convert audio effects from your music into encoded representations suitable for audio effects processing and analysis tasks.
About Fx-Encoder++
We adopt the codebase of CLAP for this project.
An audio effects representation learning based on SimCLR.
Architecture
Usage
Installation
pip install fxencoder_plusplus
Usage
Notice: The input to Fx-Encoder++ should be stereo
Initialize Models
from fxencoder_plusplus import load_model
# Load default base model (auto-downloads if needed)
DEVICE = 'cuda'
model = load_model(
'default',
device=DEVICE,
)
Extract audio effects representations from mixture tracks or stem tracks, where a single representation encodes the overall audio effects style of the entire input.
import torch
import librosa
audio_path = librosa.example('trumpet')
wav, sr = librosa.load(audio_path, sr=44100, mono=False)
wav = torch.from_numpy(wav).unsqueeze(0).unsqueeze(0).repeat(1, 2, 1).to(DEVICE) # [1, 2, seq_len]
fx_emb = model.get_fx_embedding(wav)
print(fx_emb.shape) # [1, embed_dim], [1, 128]
Extract instrument-specific audio effects representations from mixture tracks. For example, extract the audio effects representation of just the vocals within a full mix.
- Audio Reference:
import torchaudio
import julius
mixture_path = "/path/to/mixture.wav"
mixture, sr = torchaudio.load(mixture_path, num_frames=441000)
mixture = mixture.unsqueeze(0).to(DEVICE) # [1, channel, seq_len]
query_path = "/path/to/inst.wav"
query, sr = torchaudio.load(query_path, frame_offset=441000, num_frames=441000)
query = query.unsqueeze(0).to(DEVICE) # [1, channel, seq_len]
query = julius.resample_frac(query, int(44100), int(48000))
_, fx_emb = model.get_fx_embedding_by_audio_query(mixture, query)
print(fx_emb.shape) # [1, embed_dim], [1, 128]
- Text Reference:
import torchaudio
mixture_path = "/path/to/mixture.wav"
mixture, sr = torchaudio.load(mixture_path, num_frames=441000)
mixture = mixture.unsqueeze(0).to(DEVICE) # [1, channel, seq_len]
query = "the sound of vocals"
_, fx_emb = model.get_fx_embedding_by_text_query(mixture, query)
print(fx_emb.shape) # [1, embed_dim], [1, 128]
Training
Env
- Create environment with conda
conda create --name fxenc python=3.10.14
- Install
pip install -r requirements.txt
Prepare Fx-Normalized Dataset
Because the dataset has copyright restriction, unfortunatly we cannot directly share preprocessed datasets.
- Download MUSDB, MoisesDB
- Please check FxNorm-automix for preparing audio effects normalized dataset
Run
bash scripts/train_proposed.sh
Evaluation
We develop a retrieval-based evaluation pipeline (Using MUSDB dataset as the example)
- Check FxNorm-automix for preparing audio effects normalized dataset
- Synthesize evaluation dataset: check build_musdb.py
- Run retrieval-based evaluation: check eval_retrieval.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fxencoder_plusplus-0.1.4.tar.gz.
File metadata
- Download URL: fxencoder_plusplus-0.1.4.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9868543701e0e6e06ee3d80bd1f89450f75b63cdc75961d02d5aea17317f1671
|
|
| MD5 |
40bb2b43a93e35538f63084d08b82059
|
|
| BLAKE2b-256 |
3a862590cc50840d00243c266196a70101a4146b680b387be15f8d89ffa19209
|
File details
Details for the file fxencoder_plusplus-0.1.4-py3-none-any.whl.
File metadata
- Download URL: fxencoder_plusplus-0.1.4-py3-none-any.whl
- Upload date:
- Size: 8.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04e70cdc1a7f02994bcbc5fd132577c9d47c71da53beee14b61fefb3b3152696
|
|
| MD5 |
eebccc2f6de43ab0796ac1fdd15ce026
|
|
| BLAKE2b-256 |
401ff0eb699916d5b81718a7783a90fe5879e12c4492166f66c45f59be94c577
|