Developer SDK for building personalized voice AI assistants

Project description

AgentKit

A developer SDK for building personalized voice AI assistants with memory, learning, and mobile APK generation.

AgentKit is to personal AI assistants what Firebase is to apps — a complete backend + mobile shell that you configure once and deploy in under 30 minutes.

Features

Voice Pipeline — Streaming STT → LLM → TTS with <500ms first-audio latency
Memory System — Markdown (simple) or Qdrant vector (semantic search)
Learning Engine — Detects corrections, learns from mistakes, makes proactive recommendations
Multi-provider — Sarvam/Deepgram (STT), Gemini/OpenAI (LLM), Sarvam/ElevenLabs (TTS)
Mobile Shell — React Native app with VoiceOrb interface, builds to Android APK
CLI — init, serve, build, deploy commands for the full lifecycle

Quick Start

# Install
pip install agentkit-sdk

# Create a new agent project
agentkit init my-agent

# Enter the project
cd my-agent

# Add your API keys to .env

# Start the server
python main.py

Programmatic Usage

Use AgentKit like any other Python SDK — import, instantiate, call:

import asyncio
from agentkit import Agent

agent = Agent(
    persona="You are a helpful assistant",
    llm_provider="gemini",
    llm_api_key="your-gemini-key",
    llm_model="gemini-2.0-flash",
)

async def main():
    # Simple chat
    response = await agent.chat("What is Python?")
    print(response.text)

    # Streaming response
    async for token in agent.chat_stream("Tell me a joke"):
        print(token, end="", flush=True)

    await agent.close()

asyncio.run(main())

Register Tools

from agentkit import Agent

agent = Agent(
    persona="You are a helpful assistant with tools",
    llm_provider="gemini",
    llm_api_key="your-key",
)

@agent.tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"25°C and sunny in {city}"

@agent.tool
def calculate(expression: str) -> str:
    """Evaluate a math expression."""
    return str(eval(expression))

Load from Config

If you prefer YAML configuration (generated by agentkit init):

from agentkit import Agent

agent = Agent.from_config("agent.config.yaml")
response = await agent.chat("Hello!")

# Or start the server
agent.serve(port=8000)

Full Voice Pipeline

Process audio through STT → LLM → TTS:

from agentkit import Agent

agent = Agent(
    persona="You are a voice assistant",
    llm_provider="gemini",
    llm_api_key="your-key",
    stt_provider="sarvam",
    stt_api_key="your-sarvam-key",
    tts_provider="sarvam",
    tts_api_key="your-sarvam-key",
)

# Process audio
with open("question.wav", "rb") as f:
    response = await agent.process_audio(f.read())

print(response.text)    # Transcription + answer
# response.audio        # Spoken answer as WAV bytes

# Or get audio with text chat
response = await agent.chat("Hello!", include_audio=True)
with open("answer.wav", "wb") as f:
    f.write(response.audio)

Async Context Manager

async with Agent(persona="...", llm_provider="gemini", llm_api_key="key") as agent:
    response = await agent.chat("Hello!")
    print(response.text)
# Resources automatically cleaned up

CLI Commands

Command	Description
`agentkit init <name>`	Interactive project setup — picks providers, generates config
`agentkit serve`	Start FastAPI server with playground
`agentkit serve --validate-only`	Check config without starting
`agentkit build android`	Build Android APK from AgentShell template
`agentkit deploy --platform railway`	Deploy to Railway
`agentkit deploy --platform render`	Deploy to Render
`agentkit deploy --platform docker`	Generate Dockerfile

Project Structure

When you run agentkit init my-agent, you get:

my-agent/
├── main.py              ← Start here — run your agent
├── example.py           ← Standalone code usage (no YAML)
├── agent.config.yaml    ← Agent configuration
├── tools.py             ← Custom tools for your agent
├── .env                 ← API keys
├── .env.example         ← API key template
├── .gitignore
└── memory/              ← Conversation memory storage

The server starts at http://localhost:8000 with:

Playground: http://localhost:8000/playground — browser-based test UI
WebSocket: ws://localhost:8000/ws/voice — real-time voice/text endpoint
Health: http://localhost:8000/health — server status check
REST: POST /api/chat — text chat endpoint

Configuration

agent.config.yaml — generated by agentkit init:

agent:
  name: my-agent
  persona: "You are a helpful personal assistant..."
  language: hinglish  # english / hindi / hinglish

voice:
  enabled: true
  stt:
    provider: sarvam    # sarvam / deepgram
    api_key: ${SARVAM_API_KEY}
  tts:
    provider: sarvam    # sarvam / elevenlabs
    voice: meera
    api_key: ${SARVAM_API_KEY}

llm:
  provider: gemini      # gemini / openai
  model: gemini-2.0-flash
  api_key: ${GEMINI_API_KEY}
  temperature: 0.7

memory:
  type: markdown        # markdown / vector
  backend: local        # local / qdrant
  episodic_window: 20
  semantic_top_k: 5

learning:
  enabled: true
  correction_detection: true
  implicit_feedback: true
  profile_extraction: true

deployment:
  type: self-host
  port: 8000

API keys are referenced as ${VAR_NAME} and resolved from your .env file at startup.

Custom Providers

Every provider slot (STT, LLM, TTS, Memory) is pluggable. Use a built-in name or a dotted import path to your own class:

# Built-in provider
llm:
  provider: gemini

# Custom provider — any class that extends BaseLLM
llm:
  provider: my_package.llm.OllamaLLM
  api_key: ${OLLAMA_API_KEY}
  model: llama3
  base_url: http://localhost:11434

Your custom class must extend the appropriate base class (BaseSTT, BaseLLM, BaseTTS, or BaseMemory). All config keys under the provider section are passed as constructor kwargs automatically.

Writing a custom LLM provider:

# my_package/llm.py
from agentkit import BaseLLM, Message

class OllamaLLM(BaseLLM):
    def __init__(self, api_key: str, model: str = "llama3", base_url: str = "http://localhost:11434", **kwargs):
        self.model = model
        self.base_url = base_url

    async def chat_stream(self, messages, system, memory_context=""):
        # Your streaming implementation
        ...

    async def chat(self, messages, system, memory_context=""):
        # Your non-streaming implementation
        ...

    async def close(self):
        pass

Registering at runtime (alternative to dotted paths):

from agentkit import registry
from my_package.llm import OllamaLLM

registry.register("llm", "ollama", OllamaLLM)
# Now you can use provider: ollama in config

During agentkit init, select "custom" when prompted for a provider to enter your class path interactively.

Category	Base Class	Built-in Providers
STT	`BaseSTT`	`sarvam`, `deepgram`
LLM	`BaseLLM`	`gemini`, `openai`
TTS	`BaseTTS`	`sarvam`, `elevenlabs`
Memory	`BaseMemory`	`markdown`, `vector`

API Keys

Add these to your .env file based on your chosen providers:

Provider	Variable	Get it at
Sarvam AI	`SARVAM_API_KEY`	sarvam.ai
Gemini	`GEMINI_API_KEY`	aistudio.google.com
OpenAI	`OPENAI_API_KEY`	platform.openai.com
Deepgram	`DEEPGRAM_API_KEY`	deepgram.com
ElevenLabs	`ELEVENLABS_API_KEY`	elevenlabs.io

WebSocket Protocol

Connect to ws://localhost:8000/ws/voice and exchange JSON messages:

Send text:

{"type": "text", "text": "Hello, what's the weather?"}

Send audio:

{"type": "audio", "data": [/* byte array */]}

Receive responses:

{"type": "audio", "data": "base64-encoded-audio"}
{"type": "text", "text": "The assistant's text response"}
{"type": "done"}

Architecture

agentkit init → agent.config.yaml + main.py + .env
                        ↓
python main.py / agentkit serve → FastAPI server
                  ├── /ws/voice (WebSocket)
                  ├── /api/chat (REST)
                  ├── /playground (browser UI)
                  └── /health
                        ↓
              STT → LLM → TTS (streaming pipeline)
                ↕         ↕
            Memory     Learning
          (md/vector)  (corrections)

Development

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Lint
ruff check src/

License

MIT

Project details

Release history Release notifications | RSS feed

This version

0.6.0

May 22, 2026

0.5.21

May 21, 2026

0.5.20

May 21, 2026

0.5.19

May 21, 2026

0.5.18

May 21, 2026

0.5.17

May 21, 2026

0.5.16

May 21, 2026

0.5.15

May 21, 2026

0.5.14

May 21, 2026

0.5.13

May 21, 2026

0.5.12

May 21, 2026

0.5.11

May 21, 2026

0.5.10

May 21, 2026

0.5.9

May 21, 2026

0.5.8

May 21, 2026

0.5.7

May 21, 2026

0.5.6

May 18, 2026

0.5.5

May 18, 2026

0.5.4

May 18, 2026

0.5.3

May 18, 2026

0.5.2

May 18, 2026

0.5.1

May 18, 2026

0.5.0

May 18, 2026

0.4.0

May 17, 2026

0.3.5

May 17, 2026

0.3.4

May 17, 2026

0.3.3

May 17, 2026

0.3.2

May 17, 2026

0.3.1

May 16, 2026

0.3.0

May 16, 2026

0.2.1

May 16, 2026

0.2.0

May 16, 2026

0.1.0

May 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentkit_sdk-0.6.0.tar.gz (78.4 kB view details)

Uploaded May 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentkit_sdk-0.6.0-py3-none-any.whl (91.0 kB view details)

Uploaded May 22, 2026 Python 3

File details

Details for the file agentkit_sdk-0.6.0.tar.gz.

File metadata

Download URL: agentkit_sdk-0.6.0.tar.gz
Upload date: May 22, 2026
Size: 78.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for agentkit_sdk-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`cdf1bcab51f0c63f37633ef33f7d66bb48402d29dd44cccf7a1348be96f1fe91`
MD5	`ab1bebd701184de83aaaffca82060315`
BLAKE2b-256	`98a655b40b20e84006caa0bb23782b3a4d912ffb6d221a908091bf7c88bb2544`

See more details on using hashes here.

File details

Details for the file agentkit_sdk-0.6.0-py3-none-any.whl.

File metadata

Download URL: agentkit_sdk-0.6.0-py3-none-any.whl
Upload date: May 22, 2026
Size: 91.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for agentkit_sdk-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bc61ba882ba5873434295b0811ff611cce5f0388212c443fa715002f8c844bb5`
MD5	`50cf8ef61786e200b104eec70c97535e`
BLAKE2b-256	`8fa7ed72dcbe0cf7de0b9e427ac7007f66ebac8ccbe39b17f814433f70c1a8b5`

See more details on using hashes here.

agentkit-sdk 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

AgentKit

Features

Quick Start

Programmatic Usage

Register Tools

Load from Config

Full Voice Pipeline

Async Context Manager

CLI Commands

Project Structure

Configuration

Custom Providers

API Keys

WebSocket Protocol

Architecture

Development

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes