No project description provided
Project description
Audio2Face Client & Streaming Service
A comprehensive Python package that provides both a client library and a FastAPI streaming server for NVIDIA Omniverse Audio2Face (A2F). Generate facial blendshapes from audio with support for emotions, real-time streaming, and robust production-ready features.
๐ Overview
This repository contains two main components:
a2f_client- A Python client library for programmatic control of Audio2Face via HTTP APIapp- A FastAPI streaming server that processes audio files and streams blendshape data in real-time chunks
Key Use Cases:
- Real-time facial animation for avatars and VR/AR applications
- Batch processing of audio files for animation pipelines
- ML research and experimentation with facial expressions
- Integration into larger media production workflows
โจ Key Features
Client Library (a2f_client)
- Simple Python API for Audio2Face automation
- Emotion Control with customizable intensity weights
- Chunked Processing for large audio files
- Automatic USD Model Loading with ARKit blendshapes
- Robust Error Handling and logging
- Comprehensive Testing with pytest suite
Streaming Server (app)
- Real-time Streaming via FastAPI with JSON-line responses
- Dynamic FPS Optimization based on processing performance
- Parallel Processing using multiple A2F client instances
- Emotion Support for expressive facial animation
- Production Ready with logging, health checks, and error handling
- Multi-worker Support with file lock management
๐ Repository Structure
.
โโโ a2f_client/ # Python client library
โ โโโ modules/ # Internal HTTP API wrappers
โ โโโ assets/ # Default USD facial models
โ โโโ samples/ # Example audio files
โ โโโ tests/ # Client unit tests
โ โโโ ...
โโโ app/ # FastAPI streaming server
โ โโโ config.py # Server configuration
โ โโโ main.py # FastAPI application
โ โโโ streaming_logic.py # Core streaming logic
โ โโโ tests/ # Server integration tests
โโโ pyproject.toml # Poetry configuration
โโโ README.md
๐ Installation
Prerequisites
- Python 3.10+
- NVIDIA Omniverse Audio2Face (locally installed)
- Poetry (recommended) or pip
Setup
# Clone the repository
git clone git@github.com:alexthillen/a2f-client.git
cd a2f-client
# Install with Poetry
poetry install
poetry shell
# Or with pip
pip install -e .
Environment Configuration
Set your Audio2Face headless script path:
Definitions:
- Headless Script: Executable that launches Audio2Face in server mode without UI, required for API automation.
Linux:
export A2F_HEADLESS_SCRIPT="$HOME/.local/share/ov/pkg/audio2face-2023.2.0/audio2face_headless.sh"
Windows:
set A2F_HEADLESS_SCRIPT=%USERPROFILE%\AppData\Local\ov\pkg\audio2face-2023.2.0\audio2face_headless.bat
๐ฏ Quick Start
Python Client Usage
from a2f_client import A2FClient
# Initialize client
client = A2FClient(port=8192)
# Process audio with emotions
client.set_audio("path/to/audio.wav")
client.set_emotions({"joy": 0.8, "sadness": 0.2})
# Generate blendshapes for specific time window
blendshapes = client.generate_blendshapes(
start=0.5,
end=1.0,
fps=30
)
print(f"Generated {blendshapes['blendshapes']['numFrames']} frames")
Streaming Server
Start the server:
python app/main.py
# Server starts at http://localhost:8000
Process audio via HTTP:
import requests
import json
# Simple audio processing
with open("audio.wav", "rb") as f:
response = requests.post(
"http://localhost:8000/process-audio/?fps=20",
files={"audio_file": f},
stream=True
)
for line in response.iter_lines():
chunk_data = json.loads(line)
print(f"Chunk {chunk_data['chunk_id']}: {chunk_data['result']['numFrames']} frames")
With emotion control:
emotions = {"joy": 0.8, "amazement": 0.3, "anger": 0.1}
with open("audio.wav", "rb") as f:
response = requests.post(
"http://localhost:8000/process-audio/",
files={"audio_file": f},
data={"emotions": json.dumps(emotions)},
stream=True
)
โ๏ธ Configuration
Client Settings (a2f_client/settings.py)
| Variable | Default | Description |
|---|---|---|
A2F_HEADLESS_SCRIPT |
Platform-specific path | Path to A2F headless launcher |
A2F_PORT |
8190 |
Port for A2F headless server |
A2F_BASE_URL |
http://localhost |
Base URL for A2F API |
A2F_DEFAULT_USD_MODEL |
assets/mark_arkit_solved_default.usd |
Default facial model |
A2F_DEFAULT_OUTPUT_DIR |
tmp/blendshapes |
Output directory for exports |
Server Settings (app/config.py)
| Variable | Default | Description |
|---|---|---|
A2F_FASTAPI_HOST |
0.0.0.0 |
Server host address |
A2F_FASTAPI_PORT |
8000 |
Server port |
A2F_FASTAPI_DEFAULT_WORKERS |
2 |
Number of worker processes |
A2F_FASTAPI_CLIENTS_PER_WORKER |
2 |
A2F clients per worker |
A2F_FASTAPI_TMP_DIR |
tmp |
Temporary file directory |
๐งช Testing
Client Tests
# Run client unit tests
pytest a2f_client/tests/
# Test specific functionality
pytest a2f_client/tests/test_blendshapes.py::test_chunking_vs_full_export_equivalence
Server Tests
# Start server first
python app/main.py &
# Run integration tests
pytest app/tests/
# Test streaming with emotions
pytest app/tests/test_generation.py::test_streaming_endpoint_with_emotions
Test Coverage
- Client: Chunked vs full export probabilistic equivalence, emotion setting, audio loading
- Server: Streaming endpoints, dynamic FPS, emotion processing, error handling
๐ง Advanced Features
Dynamic FPS Optimization
The streaming server automatically adjusts frame rate based on processing performance:
# Algorithm used internally:
alpha = chunk_duration / processing_time
safe_fps = max(min_fps, floor(num_clients * current_fps * alpha - 7))
current_fps = min(max_fps, (current_fps + safe_fps) / 2)
This ensures optimal throughput while maintaining quality.
Emotion System
Supported Emotions:
amazement,anger,cheekiness,disgustfear,grief,joy,outofbreathpain,sadness
Usage:
# Client
client.set_emotions({"joy": 1.0, "sadness": 0.3})
# Server API
emotions = {"joy": 0.8, "anger": 0.2}
requests.post(url, data={"emotions": json.dumps(emotions)}, ...)
Parallel Processing
The server uses multiple A2F client instances with file locking to prevent port conflicts:
# Automatic port allocation with locking
streaming_manager = StreamingManager(clients_per_worker=2)
# Handles 8190, 8191, 8192, etc. automatically
๐ API Reference
Client Methods
Definitions:
- Blendshape: Numerical weights representing facial muscle deformations for 3D animation
- USD Model: Universal Scene Description file containing 3D facial geometry and blendshape targets
| Method | Parameters | Returns | Description |
|---|---|---|---|
set_audio(path) |
path: str |
None |
Load audio file and initialize model |
set_emotions(emotions) |
emotions: Dict[str, float] |
None |
Set emotion weights (0.0-1.0) |
generate_blendshapes() |
start, end, fps, use_a2e |
Dict |
Export blendshapes for time range |
Server Endpoints
| Endpoint | Method | Parameters | Description |
|---|---|---|---|
/process-audio/ |
POST |
audio_file, fps, emotions |
Stream blendshape chunks |
/ |
GET |
- | Health check |
/health |
GET |
- | Detailed system status |
๐ Troubleshooting
Common Issues
-
"Headless script not found"
- Set
A2F_HEADLESS_SCRIPTenvironment variable to correct path - Verify Audio2Face installation
- Set
-
Port conflicts
- Server uses ports 8190+ for A2F clients
- Check for running A2F instances:
netstat -tlnp | grep 819
-
File lock errors
- Remove stale lock files:
rm .lock_* - Ensure proper server shutdown
- Remove stale lock files:
-
Audio format issues
- Use WAV format for best compatibility
- Check sample rate (44.1kHz recommended)
Performance Optimization
For High Throughput:
export A2F_FASTAPI_DEFAULT_WORKERS=3
export A2F_FASTAPI_CLIENTS_PER_WORKER=3
# Uses 9 A2F client instances total
For Low Latency:
streaming_manager.chunk_size = 0.1 # Smaller chunks
๐ฌ Important Terms & Theorems
Definitions:
- Blendshape: A vector of weights $โ [1]^n$ representing facial expressions, where n is the number of facial control points
- Chunking: Temporal decomposition of audio into overlapping or non-overlapping segments for parallel processing
- A2F Instance: Individual Audio2Face headless server process bound to a specific port
- Emotion Weight: Continuous scalar $โ [1]$ controlling the intensity of named emotional expressions
Chunk Equivalence Definition: Chunked blendshape generation is $(1%, 0.1)$-chunk-equivalent to full export generation if less than $1%$ of all blendshape weights disagree by more than an absolute value of $0.1$ or by more than $0.1 \times 10^{-6}$ relative error.
Implementation verified in test_chunking_vs_full_export_equivalence
๐ Summary Table
| Component | Purpose | Key Features | Test Coverage |
|---|---|---|---|
| a2f_client | Python API client | Audio loading, emotion control, chunked export | Unit tests, equivalence validation |
| app/streaming | Real-time server | FastAPI, parallel processing, dynamic FPS | Integration tests, streaming validation |
| assets/ | 3D models | USD facial models with ARKit blendshapes | Used in all tests |
| samples/ | Test data | Reference audio files | 100% test coverage |
Fun Fact: The dynamic FPS optimization can automatically adjust from 10 FPS to 30 FPS based on your hardware's processing capability! ๐
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file a2f_client-0.1.3.tar.gz.
File metadata
- Download URL: a2f_client-0.1.3.tar.gz
- Upload date:
- Size: 3.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5345b85b801c98240cc5c6b4db68f75a04fb68193b91acfd0d94109d3725e0a4
|
|
| MD5 |
7f4d914fdc387b40d993ddcf8f95959a
|
|
| BLAKE2b-256 |
3f823de89e12bdf452cadb5081c2918076d0d11ec6a5c3f7fdbff6dcde413104
|
Provenance
The following attestation bundles were made for a2f_client-0.1.3.tar.gz:
Publisher:
pypi-publish.yml on alexthillen/a2f-client
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
a2f_client-0.1.3.tar.gz -
Subject digest:
5345b85b801c98240cc5c6b4db68f75a04fb68193b91acfd0d94109d3725e0a4 - Sigstore transparency entry: 230276119
- Sigstore integration time:
-
Permalink:
alexthillen/a2f-client@d738016e5d50e700b5adcf2a12e8c16c9f6f5bf4 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/alexthillen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@d738016e5d50e700b5adcf2a12e8c16c9f6f5bf4 -
Trigger Event:
release
-
Statement type:
File details
Details for the file a2f_client-0.1.3-py3-none-any.whl.
File metadata
- Download URL: a2f_client-0.1.3-py3-none-any.whl
- Upload date:
- Size: 3.9 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa3b140fba303fcf812a09e23e3d8602c94c483296611aec6252bfde543418a5
|
|
| MD5 |
fa6b49a6bb503e6557f9abf152d6de16
|
|
| BLAKE2b-256 |
816cf0f0a667a062a2a677f41c262206b6fdc9c65c9c2ecc8824bf87dc46667d
|
Provenance
The following attestation bundles were made for a2f_client-0.1.3-py3-none-any.whl:
Publisher:
pypi-publish.yml on alexthillen/a2f-client
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
a2f_client-0.1.3-py3-none-any.whl -
Subject digest:
fa3b140fba303fcf812a09e23e3d8602c94c483296611aec6252bfde543418a5 - Sigstore transparency entry: 230276123
- Sigstore integration time:
-
Permalink:
alexthillen/a2f-client@d738016e5d50e700b5adcf2a12e8c16c9f6f5bf4 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/alexthillen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@d738016e5d50e700b5adcf2a12e8c16c9f6f5bf4 -
Trigger Event:
release
-
Statement type: