Skip to main content

Full-duplex speech LLM client for MichiAI

Project description

Architecture Diagram

A full-duplex speech LLM with ~75ms latency.

MichiAI is a lightweight, multimodal speech large language model designed for full-duplex interaction.
Unlike traditional serial pipelines (ASR → LLM → TTS), MichiAI can listen and speak simultaneously, mimicking natural human conversation with ultra-low latency.

Read the Blog Post

⚡ Quick Specs

Feature Specification
Model Size 530M Parameters
Latency (TTFA) ~75ms (tested on RTX 4090)
Architecture Continuous Embeddings + Rectified Flow Matching
Base Backbone SmolLM-360m
Key Innovation No Coherence Loss / Single Step Decoding

🌟 Key Features

  • Full-Duplex Capability: Handles interjections and backchanneling implicitly. It "hears" while it "talks."
  • Continuous Audio Latents: Bypasses the slow decoding of traditional RVQ (Residual Vector Quantization) models. This enables high-fidelity audio with much fewer forward passes.
  • Zero-Shot Voice Cloning: Captures vocal timbre and style from just a few seconds of audio prompt.
  • Multimodal Input: Supports mixed text and audio prompting, making it compatible with existing RAG (Retrieval-Augmented Generation) frameworks.
  • No Coherence Loss: Retains the reasoning and linguistic capabilities of the underlying text LLM without the typical degradation seen in speech-to-speech models.
  • Paralinguistics: Naturally models breathing, laughing, and emotional prosody learned directly from the dataset.

🤖 Architecture Overview

1. The Listening Head

A multi-modal encoder mapping raw audio into continuous embeddings while simultaneously generating text tokens. This ensures the model understands both the semantic meaning and the emotional context.

2. The Speaking Head

Predicts audio embeddings using Rectified Flow Matching. This allows for fast, high-quality, and diverse speech generation. The embeddings are then processed through a lightweight, causal HiFi-GAN vocoder for real-time streaming.

📊 Performance Comparison

Despite being significantly smaller and trained on less data, MichiAI maintains high reasoning capabilities by efficiently utilizing pretrained text knowledge.

Model Parameters Audio Training Data Approach
Hertz-dev 8.5B 20,000,000 hours Quantized
Moshi 7B 7,000,000 hours Quantized
Qwen-Omni 7B+ 8,000,000+ hours Quantized
MichiAI 530M ~5,000 hours Continuous

🚀 Roadmap

  • Core Architecture: Continuous Embeddings + Flow Matching implementation.
  • Scaling: Implementing a larger LLM backbone.
  • Conversational Tuning: Training on specific dialogue datasets for better turn-taking.
  • Multilingual Support: Integrating non-English datasets.
  • Hugging Face Space: Launching a live interactive demo.
  • Release API client Release an API client to this repo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

michi_ai-0.1.1.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

michi_ai-0.1.1-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file michi_ai-0.1.1.tar.gz.

File metadata

  • Download URL: michi_ai-0.1.1.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.14.0-37-generic

File hashes

Hashes for michi_ai-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5247f546a0757248a816b62315dc24c4c25ec7d0c37c1950de4722f14c1e196b
MD5 827c6c33765b99fb9cd72b858c444d2e
BLAKE2b-256 9f29ef34b6f668c98bad7ddf3906275cbb09dbf5b39d32d7cab4839bb16dccf0

See more details on using hashes here.

File details

Details for the file michi_ai-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: michi_ai-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.14.0-37-generic

File hashes

Hashes for michi_ai-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 976669da38c1960643f7d26462003dd60bbde43522a363422fe01dd3fda52c55
MD5 48cdfa10454dec3c192511b826fa64b7
BLAKE2b-256 d33162d39a680b281c7173226fd0acaf461fd709b1d7c2822483b4958ccc1eb8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page