This package is written for text-to-audio/music generation.
Project description
AudioLDM 2
This repo currently support Text-to-Audio Generation (including Music)
Web APP
- Prepare running environment
conda create -n audioldm python=3.8; conda activate audioldm
pip3 install audioldm
git clone https://github.com/haoheliu/AudioLDM2; cd AudioLDM2
- Start the web application (powered by Gradio)
python3 app.py
- A link will be printed out. Click the link to open the browser and play.
Commandline Usage
Prepare running environment
# Optional
conda create -n audioldm python=3.8; conda activate audioldm
# Install AudioLDM
pip3 install git+https://github.com/haoheliu/AudioLDM2.git
- Generate based on a text prompt
audioldm2 -t "Musical constellations twinkling in the night sky, forming a cosmic melody."
- Generate based on a list of text
audioldm2 -tl batch.lst
Random Seed Matters
Sometimes model may not perform well (sounds wired or low quality) when changing into a different hardware. In this case, please adjust the random seed and find the optimal one for your hardware.
audioldm2 --seed 1234 -t "Musical constellations twinkling in the night sky, forming a cosmic melody."
Pretrained Models
You can choose model checkpoint by setting up "model_name":
audioldm2 --model_name "audioldm2-full-large-650k" -t "Musical constellations twinkling in the night sky, forming a cosmic melody."
We have three checkpoints you can choose for now:
- audioldm2-full (default): This checkpoint can perform both sound effect and music generation.
- audioldm2-music-665k: This checkpoint is specialized on music generation.
- audioldm2-full-large-650k: This checkpoint is the larger version of audioldm2-full.
Evaluation result on AudioCaps and MusicCaps evaluation set:
Coming soon.
Other options
usage: audioldm2 [-h] [-t TEXT] [-tl TEXT_LIST] [-s SAVE_PATH] [--model_name {audioldm2-full,audioldm2-music-665k,audioldm2-full-large-650k}] [-b BATCHSIZE] [--ddim_steps DDIM_STEPS] [-gs GUIDANCE_SCALE]
[-n N_CANDIDATE_GEN_PER_TEXT] [--seed SEED]
optional arguments:
-h, --help show this help message and exit
-t TEXT, --text TEXT Text prompt to the model for audio generation
-tl TEXT_LIST, --text_list TEXT_LIST
A file that contains text prompt to the model for audio generation
-s SAVE_PATH, --save_path SAVE_PATH
The path to save model output
--model_name {audioldm2-full,audioldm2-music-665k,audioldm2-full-large-650k}
The checkpoint you gonna use
-b BATCHSIZE, --batchsize BATCHSIZE
Generate how many samples at the same time
--ddim_steps DDIM_STEPS
The sampling step for DDIM
-gs GUIDANCE_SCALE, --guidance_scale GUIDANCE_SCALE
Guidance scale (Large => better quality and relavancy to text; Small => better diversity)
-n N_CANDIDATE_GEN_PER_TEXT, --n_candidate_gen_per_text N_CANDIDATE_GEN_PER_TEXT
Automatic quality control. This number control the number of candidates (e.g., generate three audios and choose the best to show you). A Larger value usually lead to better quality with
heavier computation
--seed SEED Change this value (any integer number) will lead to a different generation result.
Cite this work
If you found this tool useful, please consider citing
AudioLDM 2 paper coming soon
@article{liu2023audioldm,
title={AudioLDM: Text-to-Audio Generation with Latent Diffusion Models},
author={Liu, Haohe and Chen, Zehua and Yuan, Yi and Mei, Xinhao and Liu, Xubo and Mandic, Danilo and Wang, Wenwu and Plumbley, Mark D},
journal={arXiv preprint arXiv:2301.12503},
year={2023}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
audioldm2-0.0.8.tar.gz
(2.9 MB
view hashes)
Built Distribution
Close
Hashes for audioldm2-0.0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16a43c99bc46dfb7368db54b60b3db934d61df49c223ae3b74ced96df4fcfc6f |
|
MD5 | f79cb3b35a6d67f1eff17d3333f56193 |
|
BLAKE2b-256 | 5af4c4ca7d812bbb773179ab9d1609fccd53903a932ba7efbbaa00db59d79286 |