Skip to main content

Speaker recognition for Home Assistant using Resemblyzer

Project description

Speaker Recognition for Home Assistant

Python Version License Home Assistant

Identify speakers by their voice using machine learning. This project provides a complete speaker recognition solution for Home Assistant, including a REST API service, Python client library, custom integration, and Home Assistant addon.

โœจ Features

  • ๐ŸŽค Voice-based speaker identification using neural embeddings
  • ๐Ÿ  Native Home Assistant integration with STT and conversation agents
  • ๐Ÿณ Easy deployment via Home Assistant addon or standalone Docker
  • ๐Ÿ”Œ REST API for flexible integration with any platform
  • ๐Ÿ“ฆ Python client library for programmatic access
  • ๐ŸŽฏ High accuracy powered by Resemblyzer voice embeddings
  • โšก Fast recognition with cached embeddings
  • ๐Ÿ”ง Configurable via UI or YAML

๐Ÿ“‹ Table of Contents

๐Ÿš€ Installation

Home Assistant Addon

The easiest way to use speaker recognition in Home Assistant:

  1. Add this repository to your Home Assistant addon store
  2. Install the Speaker Recognition addon
  3. Configure the addon settings:
    • Host: 0.0.0.0 (default)
    • Port: 8099 (default)
    • Embeddings Directory: /share/speaker_recognition/embeddings
    • Log Level: info
  4. Start the addon
  5. Install the Speaker Recognition integration via the UI

Python Package

Install the client-only package (no ML dependencies):

pip install speaker-recognition

Install with server capabilities (requires Python <3.10):

pip install speaker-recognition[server]

Docker

Run the standalone service:

docker run -d \
  -p 8099:8099 \
  -v ./embeddings:/app/embeddings \
  ghcr.io/eulemitkeule/speaker-recognition:latest

๐Ÿ“– Usage

Training

Train the system with voice samples for each speaker:

Using Python Client

from speaker_recognition import SpeakerRecognitionClient
from speaker_recognition.models import TrainingRequest, VoiceSample, AudioInput

async with SpeakerRecognitionClient("http://localhost:8099") as client:
    training = await client.train(
        TrainingRequest(
            voice_samples=[
                VoiceSample(
                    user="Alice",
                    audio_input=AudioInput(
                        audio_data="<base64-encoded-audio>",
                        sample_rate=16000
                    )
                ),
                VoiceSample(
                    user="Bob",
                    audio_input=AudioInput(
                        audio_data="<base64-encoded-audio>",
                        sample_rate=16000
                    )
                )
            ]
        )
    )
    print(f"Trained {training.speakers_count} speakers")

Using REST API

curl -X POST http://localhost:8099/train \
  -H "Content-Type: application/json" \
  -d '{
    "voice_samples": [
      {
        "user": "Alice",
        "audio_input": {
          "audio_data": "<base64-audio>",
          "sample_rate": 16000
        }
      }
    ]
  }'

Recognition

Identify a speaker from audio:

Using Python Client

from speaker_recognition import SpeakerRecognitionClient
from speaker_recognition.models import RecognitionRequest, AudioInput

async with SpeakerRecognitionClient("http://localhost:8099") as client:
    result = await client.recognize(
        RecognitionRequest(
            audio_input=AudioInput(
                audio_data="<base64-encoded-audio>",
                sample_rate=16000
            )
        )
    )
    print(f"Speaker: {result.speaker} (confidence: {result.confidence:.2%})")

Home Assistant Integration

Once the integration is configured:

  1. Configure the backend in the main integration entry
  2. Map voices to users in the integration settings
  3. Add STT entity as a sub-entry for speech-to-text with speaker ID
  4. Add Conversation Agent as a sub-entry for voice commands with speaker context

The integration will automatically identify speakers and make the information available to your automations.

๐Ÿ”Œ API Documentation

Endpoints

GET /health

Health check endpoint.

Response:

{
  "status": "healthy"
}

POST /train

Train the model with voice samples.

Request:

{
  "voice_samples": [
    {
      "user": "string",
      "audio_input": {
        "audio_data": "base64-string",
        "sample_rate": 16000
      }
    }
  ]
}

Response:

{
  "speakers_count": 2,
  "message": "Training completed successfully"
}

POST /recognize

Recognize a speaker from audio.

Request:

{
  "audio_input": {
    "audio_data": "base64-string",
    "sample_rate": 16000
  }
}

Response:

{
  "speaker": "Alice",
  "confidence": 0.95
}

โš™๏ธ Configuration

Addon Configuration

host: "0.0.0.0"
port: 8099
log_level: "info"
access_log: true
embeddings_dir: "/share/speaker_recognition/embeddings"

Environment Variables

  • HOST: Server host (default: 0.0.0.0)
  • PORT: Server port (default: 8099)
  • LOG_LEVEL: Logging level (default: info)
  • ACCESS_LOG: Enable access logs (default: true)
  • EMBEDDINGS_DIR: Directory for storing embeddings (default: ./embeddings)

๐Ÿ› ๏ธ Development

Prerequisites

  • Python 3.9 (for server development)
  • Python 3.8+ (for client-only development)
  • uv package manager

Setup

# Clone the repository
git clone https://github.com/eulemitkeule/speaker-recognition.git
cd speaker-recognition

# Install dependencies
uv sync --all-groups

# Run tests
uv run pytest tests/ -v

# Run linting
uv run ruff check .

# Run type checking
uv run mypy --strict speaker_recognition

Running Locally

# Start the server
uv run python -m speaker_recognition

# Or with custom options
uv run python -m speaker_recognition --host 0.0.0.0 --port 8099

Project Structure

speaker-recognition/
โ”œโ”€โ”€ speaker_recognition/          # Main package
โ”‚   โ”œโ”€โ”€ api.py                   # FastAPI application
โ”‚   โ”œโ”€โ”€ client.py                # HTTP client
โ”‚   โ”œโ”€โ”€ models.py                # Pydantic models
โ”‚   โ””โ”€โ”€ recognizer.py            # Recognition logic
โ”œโ”€โ”€ custom_components/           # Home Assistant integration
โ”‚   โ””โ”€โ”€ speaker_recognition/
โ”œโ”€โ”€ speaker_recognition_addon/   # Home Assistant addon
โ”œโ”€โ”€ tests/                       # Test suite
โ””โ”€โ”€ example_data/               # Example audio files

๐Ÿค Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Run tests and linting
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

Code Quality

  • Follow PEP 8 style guidelines
  • Use descriptive variable and function names
  • Add type annotations
  • Write tests for new features
  • Keep methods focused and concise

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

๐Ÿ“ž Support


Made with โค๏ธ for the Home Assistant community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hass_speaker_recognition-1.0.10.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hass_speaker_recognition-1.0.10-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file hass_speaker_recognition-1.0.10.tar.gz.

File metadata

File hashes

Hashes for hass_speaker_recognition-1.0.10.tar.gz
Algorithm Hash digest
SHA256 3522c7727d31a97f2cc15e8d68eab2531125e28f1f70f6a36423c15b73d2c3c8
MD5 84dde9282dd49be6735c0f8f20bcfba2
BLAKE2b-256 f86e5eac33f9099e0b8f13be413df1754d911aa9153a983858db4892d0607f8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for hass_speaker_recognition-1.0.10.tar.gz:

Publisher: publish.yml on EuleMitKeule/speaker-recognition

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hass_speaker_recognition-1.0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for hass_speaker_recognition-1.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 270be8c9c7ca8023d4faa97e19010d2d575c69a36fbb20a2d349837d7c69a9ff
MD5 f6e3c0bfeab34f856285c8ece7c46d01
BLAKE2b-256 9a9683cbb90028b179ddd5fc3e6fb198e83434887dd98c1e8aa6ee872d25a743

See more details on using hashes here.

Provenance

The following attestation bundles were made for hass_speaker_recognition-1.0.10-py3-none-any.whl:

Publisher: publish.yml on EuleMitKeule/speaker-recognition

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page