Speaker recognition for Home Assistant using Resemblyzer
Project description
Speaker Recognition for Home Assistant
Identify speakers by their voice using machine learning. This project provides a complete speaker recognition solution for Home Assistant, including a REST API service, Python client library, custom integration, and Home Assistant addon.
โจ Features
- ๐ค Voice-based speaker identification using neural embeddings
- ๐ Native Home Assistant integration with STT and conversation agents
- ๐ณ Easy deployment via Home Assistant addon or standalone Docker
- ๐ REST API for flexible integration with any platform
- ๐ฆ Python client library for programmatic access
- ๐ฏ High accuracy powered by Resemblyzer voice embeddings
- โก Fast recognition with cached embeddings
- ๐ง Configurable via UI or YAML
๐ Table of Contents
๐ Installation
Home Assistant Addon
The easiest way to use speaker recognition in Home Assistant:
- Add this repository to your Home Assistant addon store
- Install the Speaker Recognition addon
- Configure the addon settings:
- Host:
0.0.0.0(default) - Port:
8099(default) - Embeddings Directory:
/share/speaker_recognition/embeddings - Log Level:
info
- Host:
- Start the addon
- Install the Speaker Recognition integration via the UI
Python Package
Install the client-only package (no ML dependencies):
pip install speaker-recognition
Install with server capabilities (requires Python <3.10):
pip install speaker-recognition[server]
Docker
Run the standalone service:
docker run -d \
-p 8099:8099 \
-v ./embeddings:/app/embeddings \
ghcr.io/eulemitkeule/speaker-recognition:latest
๐ Usage
Training
Train the system with voice samples for each speaker:
Using Python Client
from speaker_recognition import SpeakerRecognitionClient
from speaker_recognition.models import TrainingRequest, VoiceSample, AudioInput
async with SpeakerRecognitionClient("http://localhost:8099") as client:
training = await client.train(
TrainingRequest(
voice_samples=[
VoiceSample(
user="Alice",
audio_input=AudioInput(
audio_data="<base64-encoded-audio>",
sample_rate=16000
)
),
VoiceSample(
user="Bob",
audio_input=AudioInput(
audio_data="<base64-encoded-audio>",
sample_rate=16000
)
)
]
)
)
print(f"Trained {training.speakers_count} speakers")
Using REST API
curl -X POST http://localhost:8099/train \
-H "Content-Type: application/json" \
-d '{
"voice_samples": [
{
"user": "Alice",
"audio_input": {
"audio_data": "<base64-audio>",
"sample_rate": 16000
}
}
]
}'
Recognition
Identify a speaker from audio:
Using Python Client
from speaker_recognition import SpeakerRecognitionClient
from speaker_recognition.models import RecognitionRequest, AudioInput
async with SpeakerRecognitionClient("http://localhost:8099") as client:
result = await client.recognize(
RecognitionRequest(
audio_input=AudioInput(
audio_data="<base64-encoded-audio>",
sample_rate=16000
)
)
)
print(f"Speaker: {result.speaker} (confidence: {result.confidence:.2%})")
Home Assistant Integration
Once the integration is configured:
- Configure the backend in the main integration entry
- Map voices to users in the integration settings
- Add STT entity as a sub-entry for speech-to-text with speaker ID
- Add Conversation Agent as a sub-entry for voice commands with speaker context
The integration will automatically identify speakers and make the information available to your automations.
๐ API Documentation
Endpoints
GET /health
Health check endpoint.
Response:
{
"status": "healthy"
}
POST /train
Train the model with voice samples.
Request:
{
"voice_samples": [
{
"user": "string",
"audio_input": {
"audio_data": "base64-string",
"sample_rate": 16000
}
}
]
}
Response:
{
"speakers_count": 2,
"message": "Training completed successfully"
}
POST /recognize
Recognize a speaker from audio.
Request:
{
"audio_input": {
"audio_data": "base64-string",
"sample_rate": 16000
}
}
Response:
{
"speaker": "Alice",
"confidence": 0.95
}
โ๏ธ Configuration
Addon Configuration
host: "0.0.0.0"
port: 8099
log_level: "info"
access_log: true
embeddings_dir: "/share/speaker_recognition/embeddings"
Environment Variables
HOST: Server host (default:0.0.0.0)PORT: Server port (default:8099)LOG_LEVEL: Logging level (default:info)ACCESS_LOG: Enable access logs (default:true)EMBEDDINGS_DIR: Directory for storing embeddings (default:./embeddings)
๐ ๏ธ Development
Prerequisites
- Python 3.9 (for server development)
- Python 3.8+ (for client-only development)
- uv package manager
Setup
# Clone the repository
git clone https://github.com/eulemitkeule/speaker-recognition.git
cd speaker-recognition
# Install dependencies
uv sync --all-groups
# Run tests
uv run pytest tests/ -v
# Run linting
uv run ruff check .
# Run type checking
uv run mypy --strict speaker_recognition
Running Locally
# Start the server
uv run python -m speaker_recognition
# Or with custom options
uv run python -m speaker_recognition --host 0.0.0.0 --port 8099
Project Structure
speaker-recognition/
โโโ speaker_recognition/ # Main package
โ โโโ api.py # FastAPI application
โ โโโ client.py # HTTP client
โ โโโ models.py # Pydantic models
โ โโโ recognizer.py # Recognition logic
โโโ custom_components/ # Home Assistant integration
โ โโโ speaker_recognition/
โโโ speaker_recognition_addon/ # Home Assistant addon
โโโ tests/ # Test suite
โโโ example_data/ # Example audio files
๐ค Contributing
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests and linting
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Code Quality
- Follow PEP 8 style guidelines
- Use descriptive variable and function names
- Add type annotations
- Write tests for new features
- Keep methods focused and concise
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Resemblyzer - Neural voice embeddings
- Home Assistant - Home automation platform
- FastAPI - Modern web framework
๐ Support
- ๐ Report bugs
- ๐ก Request features
- ๐ Documentation
Made with โค๏ธ for the Home Assistant community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hass_speaker_recognition-1.0.4.tar.gz.
File metadata
- Download URL: hass_speaker_recognition-1.0.4.tar.gz
- Upload date:
- Size: 12.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb41eda55bd4edcdaca69413fdf7917e05df94fb3c22ed624e91f76566ab153c
|
|
| MD5 |
e6a10d79dc7254561f00bf165b8abd28
|
|
| BLAKE2b-256 |
62ae951f419958695ba33087647e097270ef250aa72900cbada26d719193e000
|
Provenance
The following attestation bundles were made for hass_speaker_recognition-1.0.4.tar.gz:
Publisher:
publish.yml on EuleMitKeule/speaker-recognition
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hass_speaker_recognition-1.0.4.tar.gz -
Subject digest:
bb41eda55bd4edcdaca69413fdf7917e05df94fb3c22ed624e91f76566ab153c - Sigstore transparency entry: 782062967
- Sigstore integration time:
-
Permalink:
EuleMitKeule/speaker-recognition@176ff46945d2efef7decd806630bc685f370d0a6 -
Branch / Tag:
refs/tags/1.0.4 - Owner: https://github.com/EuleMitKeule
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@176ff46945d2efef7decd806630bc685f370d0a6 -
Trigger Event:
release
-
Statement type:
File details
Details for the file hass_speaker_recognition-1.0.4-py3-none-any.whl.
File metadata
- Download URL: hass_speaker_recognition-1.0.4-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab82c3f26e9c48eae13414ec0c78959ea5458b5c6358f4fea073fc2788ebd1b1
|
|
| MD5 |
d809906b18fd50747b43f070df61b463
|
|
| BLAKE2b-256 |
13067310d263f7044c6755915023d1f0554c1100f789a6dbd4cd2dc694b7c8a1
|
Provenance
The following attestation bundles were made for hass_speaker_recognition-1.0.4-py3-none-any.whl:
Publisher:
publish.yml on EuleMitKeule/speaker-recognition
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hass_speaker_recognition-1.0.4-py3-none-any.whl -
Subject digest:
ab82c3f26e9c48eae13414ec0c78959ea5458b5c6358f4fea073fc2788ebd1b1 - Sigstore transparency entry: 782062969
- Sigstore integration time:
-
Permalink:
EuleMitKeule/speaker-recognition@176ff46945d2efef7decd806630bc685f370d0a6 -
Branch / Tag:
refs/tags/1.0.4 - Owner: https://github.com/EuleMitKeule
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@176ff46945d2efef7decd806630bc685f370d0a6 -
Trigger Event:
release
-
Statement type: