Speaker recognition for Home Assistant using Resemblyzer

Project description

Speaker Recognition for Home Assistant

Python Version License Home Assistant

Identify speakers by their voice using machine learning. This project provides a complete speaker recognition solution for Home Assistant, including a REST API service, Python client library, custom integration, and Home Assistant addon.

✨ Features

🎤 Voice-based speaker identification using neural embeddings
🏠 Native Home Assistant integration with STT and conversation agents
🐳 Easy deployment via Home Assistant addon or standalone Docker
🔌 REST API for flexible integration with any platform
📦 Python client library for programmatic access
🎯 High accuracy powered by Resemblyzer voice embeddings
⚡ Fast recognition with cached embeddings
🔧 Configurable via UI or YAML

🚀 Installation

Home Assistant Addon

The easiest way to use speaker recognition in Home Assistant:

Add this repository to your Home Assistant addon store
Install the Speaker Recognition addon
Configure the addon settings:
- Host: 0.0.0.0 (default)
- Port: 8099 (default)
- Embeddings Directory: /share/speaker_recognition/embeddings
- Log Level: info
Start the addon
Install the Speaker Recognition integration via the UI

Python Package

Install the client-only package (no ML dependencies):

pip install speaker-recognition

Install with server capabilities (requires Python <3.10):

pip install speaker-recognition[server]

Docker

Run the standalone service:

docker run -d \
  -p 8099:8099 \
  -v ./embeddings:/app/embeddings \
  ghcr.io/eulemitkeule/speaker-recognition:latest

📖 Usage

Training

Train the system with voice samples for each speaker:

Using Python Client

from speaker_recognition import SpeakerRecognitionClient
from speaker_recognition.models import TrainingRequest, VoiceSample, AudioInput

async with SpeakerRecognitionClient("http://localhost:8099") as client:
    training = await client.train(
        TrainingRequest(
            voice_samples=[
                VoiceSample(
                    user="Alice",
                    audio_input=AudioInput(
                        audio_data="<base64-encoded-audio>",
                        sample_rate=16000
                    )
                ),
                VoiceSample(
                    user="Bob",
                    audio_input=AudioInput(
                        audio_data="<base64-encoded-audio>",
                        sample_rate=16000
                    )
                )
            ]
        )
    )
    print(f"Trained {training.speakers_count} speakers")

Using REST API

curl -X POST http://localhost:8099/train \
  -H "Content-Type: application/json" \
  -d '{
    "voice_samples": [
      {
        "user": "Alice",
        "audio_input": {
          "audio_data": "<base64-audio>",
          "sample_rate": 16000
        }
      }
    ]
  }'

Recognition

Identify a speaker from audio:

Using Python Client

from speaker_recognition import SpeakerRecognitionClient
from speaker_recognition.models import RecognitionRequest, AudioInput

async with SpeakerRecognitionClient("http://localhost:8099") as client:
    result = await client.recognize(
        RecognitionRequest(
            audio_input=AudioInput(
                audio_data="<base64-encoded-audio>",
                sample_rate=16000
            )
        )
    )
    print(f"Speaker: {result.speaker} (confidence: {result.confidence:.2%})")

Home Assistant Integration

Once the integration is configured:

Configure the backend in the main integration entry
Map voices to users in the integration settings
Add STT entity as a sub-entry for speech-to-text with speaker ID
Add Conversation Agent as a sub-entry for voice commands with speaker context

The integration will automatically identify speakers and make the information available to your automations.

🔌 API Documentation

Endpoints

`GET /health`

Health check endpoint.

Response:

{
  "status": "healthy"
}

`POST /train`

Train the model with voice samples.

Request:

{
  "voice_samples": [
    {
      "user": "string",
      "audio_input": {
        "audio_data": "base64-string",
        "sample_rate": 16000
      }
    }
  ]
}

Response:

{
  "speakers_count": 2,
  "message": "Training completed successfully"
}

`POST /recognize`

Recognize a speaker from audio.

Request:

{
  "audio_input": {
    "audio_data": "base64-string",
    "sample_rate": 16000
  }
}

Response:

{
  "speaker": "Alice",
  "confidence": 0.95
}

⚙️ Configuration

Addon Configuration

host: "0.0.0.0"
port: 8099
log_level: "info"
access_log: true
embeddings_dir: "/share/speaker_recognition/embeddings"

Environment Variables

HOST: Server host (default: 0.0.0.0)
PORT: Server port (default: 8099)
LOG_LEVEL: Logging level (default: info)
ACCESS_LOG: Enable access logs (default: true)
EMBEDDINGS_DIR: Directory for storing embeddings (default: ./embeddings)

🛠️ Development

Prerequisites

Python 3.9 (for server development)
Python 3.8+ (for client-only development)
uv package manager

Setup

# Clone the repository
git clone https://github.com/eulemitkeule/speaker-recognition.git
cd speaker-recognition

# Install dependencies
uv sync --all-groups

# Run tests
uv run pytest tests/ -v

# Run linting
uv run ruff check .

# Run type checking
uv run mypy --strict speaker_recognition

Running Locally

# Start the server
uv run python -m speaker_recognition

# Or with custom options
uv run python -m speaker_recognition --host 0.0.0.0 --port 8099

Project Structure

speaker-recognition/
├── speaker_recognition/          # Main package
│   ├── api.py                   # FastAPI application
│   ├── client.py                # HTTP client
│   ├── models.py                # Pydantic models
│   └── recognizer.py            # Recognition logic
├── custom_components/           # Home Assistant integration
│   └── speaker_recognition/
├── speaker_recognition_addon/   # Home Assistant addon
├── tests/                       # Test suite
└── example_data/               # Example audio files

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes
Run tests and linting
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Code Quality

Follow PEP 8 style guidelines
Use descriptive variable and function names
Add type annotations
Write tests for new features
Keep methods focused and concise

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Resemblyzer - Neural voice embeddings
Home Assistant - Home automation platform
FastAPI - Modern web framework

📞 Support

Made with ❤️ for the Home Assistant community

Project details

Release history Release notifications | RSS feed

1.0.12

Dec 30, 2025

1.0.11

Dec 30, 2025

This version

1.0.10

Dec 30, 2025

1.0.9

Dec 30, 2025

1.0.8

Dec 30, 2025

1.0.7

Dec 30, 2025

1.0.6

Dec 30, 2025

1.0.5

Dec 30, 2025

1.0.4

Dec 30, 2025

1.0.3

Dec 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hass_speaker_recognition-1.0.10.tar.gz (12.0 kB view details)

Uploaded Dec 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hass_speaker_recognition-1.0.10-py3-none-any.whl (13.0 kB view details)

Uploaded Dec 30, 2025 Python 3

File details

Details for the file hass_speaker_recognition-1.0.10.tar.gz.

File metadata

Download URL: hass_speaker_recognition-1.0.10.tar.gz
Upload date: Dec 30, 2025
Size: 12.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hass_speaker_recognition-1.0.10.tar.gz
Algorithm	Hash digest
SHA256	`3522c7727d31a97f2cc15e8d68eab2531125e28f1f70f6a36423c15b73d2c3c8`
MD5	`84dde9282dd49be6735c0f8f20bcfba2`
BLAKE2b-256	`f86e5eac33f9099e0b8f13be413df1754d911aa9153a983858db4892d0607f8c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hass_speaker_recognition-1.0.10.tar.gz:

Publisher: publish.yml on EuleMitKeule/speaker-recognition

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hass_speaker_recognition-1.0.10.tar.gz
- Subject digest: 3522c7727d31a97f2cc15e8d68eab2531125e28f1f70f6a36423c15b73d2c3c8
- Sigstore transparency entry: 782067194
- Sigstore integration time: Dec 30, 2025
Source repository:
- Permalink: EuleMitKeule/speaker-recognition@6d693bf3a5a91b01711ffe3251c5cb1c3636a9ea
- Branch / Tag: refs/tags/1.0.10
- Owner: https://github.com/EuleMitKeule
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6d693bf3a5a91b01711ffe3251c5cb1c3636a9ea
- Trigger Event: release

File details

Details for the file hass_speaker_recognition-1.0.10-py3-none-any.whl.

File metadata

Download URL: hass_speaker_recognition-1.0.10-py3-none-any.whl
Upload date: Dec 30, 2025
Size: 13.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hass_speaker_recognition-1.0.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`270be8c9c7ca8023d4faa97e19010d2d575c69a36fbb20a2d349837d7c69a9ff`
MD5	`f6e3c0bfeab34f856285c8ece7c46d01`
BLAKE2b-256	`9a9683cbb90028b179ddd5fc3e6fb198e83434887dd98c1e8aa6ee872d25a743`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hass_speaker_recognition-1.0.10-py3-none-any.whl:

Publisher: publish.yml on EuleMitKeule/speaker-recognition

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hass_speaker_recognition-1.0.10-py3-none-any.whl
- Subject digest: 270be8c9c7ca8023d4faa97e19010d2d575c69a36fbb20a2d349837d7c69a9ff
- Sigstore transparency entry: 782067198
- Sigstore integration time: Dec 30, 2025
Source repository:
- Permalink: EuleMitKeule/speaker-recognition@6d693bf3a5a91b01711ffe3251c5cb1c3636a9ea
- Branch / Tag: refs/tags/1.0.10
- Owner: https://github.com/EuleMitKeule
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6d693bf3a5a91b01711ffe3251c5cb1c3636a9ea
- Trigger Event: release

hass-speaker-recognition 1.0.10

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Speaker Recognition for Home Assistant

✨ Features

📋 Table of Contents

🚀 Installation

Home Assistant Addon

Python Package

Docker

📖 Usage

Training

Using Python Client

Using REST API

Recognition

Using Python Client

Home Assistant Integration

🔌 API Documentation

Endpoints

GET /health

POST /train

POST /recognize

⚙️ Configuration

Addon Configuration

Environment Variables

🛠️ Development

Prerequisites

Setup

Running Locally

Project Structure

🤝 Contributing

Code Quality

📄 License

🙏 Acknowledgments

📞 Support

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`GET /health`

`POST /train`

`POST /recognize`