Developer SDK for building personalized voice AI assistants
Project description
AgentKit
A developer SDK for building personalized voice AI assistants with memory, learning, and mobile APK generation.
AgentKit is to personal AI assistants what Firebase is to apps — a complete backend + mobile shell that you configure once and deploy in under 30 minutes.
Features
- Voice Pipeline — Streaming STT → LLM → TTS with <500ms first-audio latency
- Memory System — Markdown (simple) or Qdrant vector (semantic search)
- Learning Engine — Detects corrections, learns from mistakes, makes proactive recommendations
- Multi-provider — Sarvam/Deepgram (STT), Gemini/OpenAI (LLM), Sarvam/ElevenLabs (TTS)
- Mobile Shell — React Native app with VoiceOrb interface, builds to Android APK
- CLI —
init,serve,build,deploycommands for the full lifecycle
Quick Start
# Install
pip install agentkit-sdk
# Create a new agent project (interactive)
agentkit init my-agent
# Enter the project
cd my-agent
# Add your API keys
# Edit .env and fill in the required keys
# Start the server
agentkit serve
The server starts at http://localhost:8000 with:
- Playground:
http://localhost:8000/playground— browser-based test UI - WebSocket:
ws://localhost:8000/ws/voice— real-time voice/text endpoint - Health:
http://localhost:8000/health— server status check - REST:
POST /api/chat— text chat endpoint
CLI Commands
| Command | Description |
|---|---|
agentkit init <name> |
Interactive project setup — picks providers, generates config |
agentkit serve |
Start FastAPI server with playground |
agentkit serve --validate-only |
Check config without starting |
agentkit build android |
Build Android APK from AgentShell template |
agentkit deploy --platform railway |
Deploy to Railway |
agentkit deploy --platform render |
Deploy to Render |
agentkit deploy --platform docker |
Generate Dockerfile |
Configuration
agent.config.yaml — generated by agentkit init:
agent:
name: my-agent
persona: "You are a helpful personal assistant..."
language: hinglish # english / hindi / hinglish
voice:
enabled: true
stt:
provider: sarvam # sarvam / deepgram
api_key: ${SARVAM_API_KEY}
tts:
provider: sarvam # sarvam / elevenlabs
voice: meera
api_key: ${SARVAM_API_KEY}
llm:
provider: gemini # gemini / openai
model: gemini-2.0-flash
api_key: ${GEMINI_API_KEY}
temperature: 0.7
memory:
type: markdown # markdown / vector
backend: local # local / qdrant
episodic_window: 20
semantic_top_k: 5
learning:
enabled: true
correction_detection: true
implicit_feedback: true
profile_extraction: true
deployment:
type: self-host
port: 8000
API keys are referenced as ${VAR_NAME} and resolved from your .env file at startup.
Custom Providers
Every provider slot (STT, LLM, TTS, Memory) is pluggable. Use a built-in name or a dotted import path to your own class:
# Built-in provider
llm:
provider: gemini
# Custom provider — any class that extends BaseLLM
llm:
provider: my_package.llm.OllamaLLM
api_key: ${OLLAMA_API_KEY}
model: llama3
base_url: http://localhost:11434
Your custom class must extend the appropriate base class (BaseSTT, BaseLLM, BaseTTS, or BaseMemory). All config keys under the provider section are passed as constructor kwargs automatically.
Writing a custom LLM provider:
# my_package/llm.py
from agentkit.providers.llm.base import BaseLLM, Message
class OllamaLLM(BaseLLM):
def __init__(self, api_key: str, model: str = "llama3", base_url: str = "http://localhost:11434", **kwargs):
self.model = model
self.base_url = base_url
async def chat_stream(self, messages, system, memory_context=""):
# Your streaming implementation
...
async def chat(self, messages, system, memory_context=""):
# Your non-streaming implementation
...
async def close(self):
pass
Registering at runtime (alternative to dotted paths):
from agentkit.providers import registry
from my_package.llm import OllamaLLM
registry.register("llm", "ollama", OllamaLLM)
# Now you can use provider: ollama in config
During agentkit init, select "custom" when prompted for a provider to enter your class path interactively.
| Category | Base Class | Built-in Providers |
|---|---|---|
| STT | BaseSTT |
sarvam, deepgram |
| LLM | BaseLLM |
gemini, openai |
| TTS | BaseTTS |
sarvam, elevenlabs |
| Memory | BaseMemory |
markdown, vector |
API Keys
Add these to your .env file based on your chosen providers:
| Provider | Variable | Get it at |
|---|---|---|
| Sarvam AI | SARVAM_API_KEY |
sarvam.ai |
| Gemini | GEMINI_API_KEY |
aistudio.google.com |
| OpenAI | OPENAI_API_KEY |
platform.openai.com |
| Deepgram | DEEPGRAM_API_KEY |
deepgram.com |
| ElevenLabs | ELEVENLABS_API_KEY |
elevenlabs.io |
WebSocket Protocol
Connect to ws://localhost:8000/ws/voice and exchange JSON messages:
Send text:
{"type": "text", "text": "Hello, what's the weather?"}
Send audio:
{"type": "audio", "data": [/* byte array */]}
Receive responses:
{"type": "audio", "data": "base64-encoded-audio"}
{"type": "text", "text": "The assistant's text response"}
{"type": "done"}
Architecture
agentkit init → agent.config.yaml + .env
↓
agentkit serve → FastAPI server
├── /ws/voice (WebSocket)
├── /api/chat (REST)
├── /playground (browser UI)
└── /health
↓
STT → LLM → TTS (streaming pipeline)
↕ ↕
Memory Learning
(md/vector) (corrections)
Development
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest
# Lint
ruff check src/
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentkit_sdk-0.2.0.tar.gz.
File metadata
- Download URL: agentkit_sdk-0.2.0.tar.gz
- Upload date:
- Size: 27.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99437c68f1fe4368848bdafd9f05dcb5eca40652d77defb7497c2d3f42d48b5a
|
|
| MD5 |
4440921ed63fe057a3cd7db01614cded
|
|
| BLAKE2b-256 |
cbdbc9e20e1e679426e98ae6880e49cd8a3768cec9789c109eea7f1fca1c3f00
|
File details
Details for the file agentkit_sdk-0.2.0-py3-none-any.whl.
File metadata
- Download URL: agentkit_sdk-0.2.0-py3-none-any.whl
- Upload date:
- Size: 34.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d49c3da9bf1892d821bd82247a2ff9148b154ad58d42bc4448c4d01ce421fc2
|
|
| MD5 |
f3877f1bc86cc30906a908b9e8039c70
|
|
| BLAKE2b-256 |
38cbeb481abd3f5736413e8873c3900b22e74ddfa55827fdcd00ee85bf5832a9
|