echomimic
Project description
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
Gallery
Audio Driven (Sing)
Audio Driven (English)
Audio Driven (Chinese)
Landmark Driven
Audio + Selected Landmark Driven
(Some demo images above are sourced from image websites. If there is any infringement, we will immediately remove them and apologize.)
Installation
Download the Codes
git clone https://github.com/BadToBest/EchoMimic
cd EchoMimic
Python Environment Setup
- Tested System Environment: Centos 7.2/Ubuntu 22.04, Cuda >= 11.7
- Tested GPUs: A100(80G) / RTX4090D (24G) / V100(16G)
- Tested Python Version: 3.8 / 3.10 / 3.11
Create conda environment (Recommended):
conda create -n echomimic python=3.8
conda activate echomimic
Install packages with pip
pip install -r requirements.txt
Download ffmpeg-static
Download and decompress ffmpeg-static, then
export FFMPEG_PATH=/path/to/ffmpeg-4.4-amd64-static
Download pretrained weights
git lfs install
git clone https://huggingface.co/BadToBest/EchoMimic pretrained_weights
The pretrained_weights is organized as follows.
./pretrained_weights/
├── denoising_unet.pth
├── reference_unet.pth
├── motion_module.pth
├── face_locator.pth
├── sd-vae-ft-mse
│ └── ...
├── sd-image-variations-diffusers
│ └── ...
└── audio_processor
└── whisper_tiny.pt
In which denoising_unet.pth / reference_unet.pth / motion_module.pth / face_locator.pth are the main checkpoints of EchoMimic. Other models in this hub can be also downloaded from it's original hub, thanks to their brilliant works:
Audio-Drived Algo Inference
Run the python inference script:
python -u infer_audio2vid.py
Audio-Drived Algo Inference On Your Own Cases
Edit the inference config file ./configs/prompts/animation.yaml, and add your own case:
test_cases:
"path/to/your/image":
- "path/to/your/audio"
The run the python inference script:
python -u infer_audio2vid.py
Release Plans
Status | Milestone | ETA |
---|---|---|
🚀 | The inference source code of the Audio-Driven algo meet everyone on GitHub | 9th July, 2024 |
🚀 | Pretrained models trained on English and Mandarin Chinese to be released | 9th July, 2024 |
🚀 | The inference source code of the Pose-Driven algo meet everyone on GitHub | 13th July, 2024 |
🚀 | Pretrained models with better pose control to be released | 13th July, 2024 |
🚀 | Pretrained models with better sing performance to be released | TBD |
🚀 | Large-Scale and High-resolution Chinese-Based Talking Head Dataset | TBD |
Acknowledgements
We would like to thank the contributors to the AnimateDiff, Moore-AnimateAnyone and MuseTalk repositories, for their open research and exploration.
We are also grateful to V-Express and hallo for their outstanding work in the area of diffusion-based talking heads.
If we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately.
Citation
If you find our work useful for your research, please consider citing the paper:
@misc{chen2024echomimic,
title={EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning},
author={Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma},
year={2024},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file echomimic-0.0.1.dev0-py3-none-any.whl
.
File metadata
- Download URL: echomimic-0.0.1.dev0-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d20e98d250c43623cb212454168f4ad91253e470c7816f42f8b10a474e424ad |
|
MD5 | 956cbb8a986aabe16a5bc55a739cd089 |
|
BLAKE2b-256 | c47c4b1277c520695fb9272a8d800aa7cca1e28b0ea5ed9c195bbf7a3f06ba5c |