Skip to main content

Developer SDK for building personalized voice AI assistants

Project description

AgentKit

A developer SDK for building personalized voice AI assistants with memory, learning, and mobile APK generation.

AgentKit is to personal AI assistants what Firebase is to apps — a complete backend + mobile shell that you configure once and deploy in under 30 minutes.

Features

  • Voice Pipeline — Streaming STT → LLM → TTS with <500ms first-audio latency
  • Memory System — Markdown (simple) or Qdrant vector (semantic search)
  • Learning Engine — Detects corrections, learns from mistakes, makes proactive recommendations
  • Multi-provider — Sarvam/Deepgram (STT), Gemini/OpenAI (LLM), Sarvam/ElevenLabs (TTS)
  • Mobile Shell — React Native app with VoiceOrb interface, builds to Android APK
  • CLIinit, serve, build, deploy commands for the full lifecycle

Quick Start

# Install
pip install agentkit-sdk

# Create a new agent project (interactive)
agentkit init my-agent

# Enter the project
cd my-agent

# Add your API keys
# Edit .env and fill in the required keys

# Start the server
agentkit serve

The server starts at http://localhost:8000 with:

  • Playground: http://localhost:8000/playground — browser-based test UI
  • WebSocket: ws://localhost:8000/ws/voice — real-time voice/text endpoint
  • Health: http://localhost:8000/health — server status check
  • REST: POST /api/chat — text chat endpoint

CLI Commands

Command Description
agentkit init <name> Interactive project setup — picks providers, generates config
agentkit serve Start FastAPI server with playground
agentkit serve --validate-only Check config without starting
agentkit build android Build Android APK from AgentShell template
agentkit deploy --platform railway Deploy to Railway
agentkit deploy --platform render Deploy to Render
agentkit deploy --platform docker Generate Dockerfile

Configuration

agent.config.yaml — generated by agentkit init:

agent:
  name: my-agent
  persona: "You are a helpful personal assistant..."
  language: hinglish  # english / hindi / hinglish

voice:
  enabled: true
  stt:
    provider: sarvam    # sarvam / deepgram
    api_key: ${SARVAM_API_KEY}
  tts:
    provider: sarvam    # sarvam / elevenlabs
    voice: meera
    api_key: ${SARVAM_API_KEY}

llm:
  provider: gemini      # gemini / openai
  model: gemini-2.0-flash
  api_key: ${GEMINI_API_KEY}
  temperature: 0.7

memory:
  type: markdown        # markdown / vector
  backend: local        # local / qdrant
  episodic_window: 20
  semantic_top_k: 5

learning:
  enabled: true
  correction_detection: true
  implicit_feedback: true
  profile_extraction: true

deployment:
  type: self-host
  port: 8000

API keys are referenced as ${VAR_NAME} and resolved from your .env file at startup.

Custom Providers

Every provider slot (STT, LLM, TTS, Memory) is pluggable. Use a built-in name or a dotted import path to your own class:

# Built-in provider
llm:
  provider: gemini

# Custom provider — any class that extends BaseLLM
llm:
  provider: my_package.llm.OllamaLLM
  api_key: ${OLLAMA_API_KEY}
  model: llama3
  base_url: http://localhost:11434

Your custom class must extend the appropriate base class (BaseSTT, BaseLLM, BaseTTS, or BaseMemory). All config keys under the provider section are passed as constructor kwargs automatically.

Writing a custom LLM provider:

# my_package/llm.py
from agentkit.providers.llm.base import BaseLLM, Message

class OllamaLLM(BaseLLM):
    def __init__(self, api_key: str, model: str = "llama3", base_url: str = "http://localhost:11434", **kwargs):
        self.model = model
        self.base_url = base_url

    async def chat_stream(self, messages, system, memory_context=""):
        # Your streaming implementation
        ...

    async def chat(self, messages, system, memory_context=""):
        # Your non-streaming implementation
        ...

    async def close(self):
        pass

Registering at runtime (alternative to dotted paths):

from agentkit.providers import registry
from my_package.llm import OllamaLLM

registry.register("llm", "ollama", OllamaLLM)
# Now you can use provider: ollama in config

During agentkit init, select "custom" when prompted for a provider to enter your class path interactively.

Category Base Class Built-in Providers
STT BaseSTT sarvam, deepgram
LLM BaseLLM gemini, openai
TTS BaseTTS sarvam, elevenlabs
Memory BaseMemory markdown, vector

API Keys

Add these to your .env file based on your chosen providers:

Provider Variable Get it at
Sarvam AI SARVAM_API_KEY sarvam.ai
Gemini GEMINI_API_KEY aistudio.google.com
OpenAI OPENAI_API_KEY platform.openai.com
Deepgram DEEPGRAM_API_KEY deepgram.com
ElevenLabs ELEVENLABS_API_KEY elevenlabs.io

WebSocket Protocol

Connect to ws://localhost:8000/ws/voice and exchange JSON messages:

Send text:

{"type": "text", "text": "Hello, what's the weather?"}

Send audio:

{"type": "audio", "data": [/* byte array */]}

Receive responses:

{"type": "audio", "data": "base64-encoded-audio"}
{"type": "text", "text": "The assistant's text response"}
{"type": "done"}

Architecture

agentkit init → agent.config.yaml + .env
                        ↓
agentkit serve → FastAPI server
                  ├── /ws/voice (WebSocket)
                  ├── /api/chat (REST)
                  ├── /playground (browser UI)
                  └── /health
                        ↓
              STT → LLM → TTS (streaming pipeline)
                ↕         ↕
            Memory     Learning
          (md/vector)  (corrections)

Development

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check src/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentkit_sdk-0.3.0.tar.gz (29.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentkit_sdk-0.3.0-py3-none-any.whl (37.0 kB view details)

Uploaded Python 3

File details

Details for the file agentkit_sdk-0.3.0.tar.gz.

File metadata

  • Download URL: agentkit_sdk-0.3.0.tar.gz
  • Upload date:
  • Size: 29.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for agentkit_sdk-0.3.0.tar.gz
Algorithm Hash digest
SHA256 17ead93ad72e86126a6812dabbe124fca08acfd820099ea71715419aaa8e868f
MD5 1052d907d50d33aaf3afb31792b9502c
BLAKE2b-256 d40bbb05ef99adc2251cd153b469a0ea956acc1ee95c0eb81e7c315a6c096197

See more details on using hashes here.

File details

Details for the file agentkit_sdk-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: agentkit_sdk-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 37.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for agentkit_sdk-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b78706593eada6d3d9ea60ad4afeb5a570a6f6879d3944848aa2d8c6dddd8ff3
MD5 b65c27ac7eeef2d711b7a3573f66fdf3
BLAKE2b-256 90f53c3c1ceb79e4435f4cac411e0564c90219f1bbda3a1796e0424a6e711286

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page