PATY — Declarative voice agent deployment on Pipecat
Project description
PATY — Please & Thank You
Declarative voice agent deployment on Pipecat. Write a YAML config, run paty run config.yaml, get a working voice agent. No bot.py to write.
Prerequisites
- Python 3.11+
- uv — the Python package manager used to install and run PATY. Install with:
curl -LsSf https://astral.sh/uv/install.sh | sh
- Platform-specific toolchain for local inference:
- Apple Silicon (macOS arm64): nothing extra — the
[mlx]extra pulls in MLX. - NVIDIA GPU (CUDA): a working CUDA toolchain so
llama-cpp-pythoncan build with GPU offload. See the llama-cpp-python CUDA build docs. - CPU-only: a C/C++ toolchain (
build-essentialon Linux, Xcode Command Line Tools on macOS) forllama-cpp-python.
- Apple Silicon (macOS arm64): nothing extra — the
Installation
git clone https://github.com/PATYai/PATY.git
cd PATY/cli
Pick the extra that matches your hardware and sync:
# Apple Silicon (M1/M2/M3/M4)
uv sync --extra mlx
# NVIDIA GPU
uv sync --extra cuda
# CPU-only fallback
uv sync --extra cpu
The extras install Pipecat plus the local inference backend (MLX or llama-cpp-python). Skip them only if you plan to point PATY at remote services.
External services
- LLM — PATY spawns a managed inference server automatically (
mlx_lm.serveron Apple Silicon,llama_cpp.serveron CUDA/CPU). No separate Ollama install is required; models are pulled from Hugging Face on first run. - TTS on CUDA/CPU — the
kokoroprovider expects an OpenAI-compatible Kokoro FastAPI server athttp://localhost:8880/v1. The easiest way is the Docker image from remsky/Kokoro-FastAPI. Apple Silicon runs Kokoro in-process viamlx-audioand needs nothing extra. - Piper (CPU alternative) —
tts: piperdownloads its voice model on first use; no server needed.
First run
uv run paty run examples/paty.yaml
On first launch PATY will:
- Detect your platform and memory, then pick a hardware profile.
- Download the LLM weights (a few GB — the first start is slow; subsequent runs hit the Hugging Face cache).
- Download the Whisper STT model on first use.
- Start the managed LLM server, warm it up, then open a local mic/speaker transport so you can talk to the agent.
Press Ctrl+C to stop.
Development install
uv sync --extra mlx --extra dev # or --extra cuda / --extra cpu
uv run pytest tests/ -v # run tests
uv run ruff check paty/ tests/ # lint
uv run ruff format --check paty/ tests/ # format check
Config
The YAML config is PATY's primary interface. A minimal example:
pak:
persona: "You are a receptionist for Dr. Smith's dental office."
pipeline:
stt: whisper
llm: ollama
tts: kokoro
vad: silero
hardware:
profile: auto # or: apple-16gb, apple-24gb, cuda-24gb, cpu-only
sip:
provider: voip-ms
host: sip.voip.ms
username: "100000"
password: "${SIP_PASSWORD}"
did: "+13035551234"
tracing:
enabled: true
console: true
Pipeline entries accept string shorthand (stt: whisper) or expanded form:
pipeline:
stt:
provider: whisper
model: large-v3-turbo
llm:
provider: ollama
model: qwen3:14b
base_url: http://localhost:11434/v1
tts:
provider: kokoro
voice: af_bella
base_url: http://localhost:8880/v1
Environment variables in ${VAR} syntax are interpolated at load time.
CLI Commands
paty run <config.yaml> Start the voice agent
paty bus tail Subscribe to a running bus and print events
paty bus tui Live conversation view subscribed to the bus
paty profiles List hardware profiles and their model selections
paty pak list List installed PAKs
paty pak active Print the currently active PAK
paty pak validate <path> Validate a PAK directory
paty pak switch <name> Set the active PAK (applies on next `paty run`)
paty init Scaffold a starter config (coming soon)
paty doctor Check dependencies (coming soon)
paty eject <config.yaml> Generate standalone bot.py (coming soon)
PAKs (Personality Augmentation Kits)
A PAK bundles a persona (system prompt) and voice settings (TTS provider/voice, optional LLM pin) into a self-contained directory. PATY ships a default paty PAK; additional PAKs can be installed under ~/.paty/paks/<name>/. Each PAK directory contains:
pak.yaml # manifest: name, version, voice config
soul.md # the system prompt / persona document
A PAK-style paty.yaml:
pak:
active: paty # name of an installed PAK; bundled default is "paty"
hardware:
profile: auto
For an ad-hoc persona without a PAK directory, set pak.persona instead of pak.active (the two are mutually exclusive). A transient PAK is synthesized from the inline text and routed through the same voice-resolution pipeline as a registered PAK. If neither field is set, the bundled paty PAK is loaded automatically.
User-provided pipeline.tts.voice or pipeline.llm.model override what the PAK declares — useful for debugging or forcing every PAK onto a single voice.
PAKs may pin voice.llm.model to a specific LLM. This is allowed but expensive — switching to or from a differently-pinned PAK forces a full LLM reload. PATY logs a loud warning at startup when a pin disagrees with the resolved hardware profile.
Note: hot-swap is not yet implemented.
paty pak switch <name>updates the active pointer; the change applies on the nextpaty run. A follow-up will land in-process swap (TTS replaced live, LLM warmed up where compatible).
Event Bus
PATY can publish session events over a WebSocket so other processes (e.g. a TUI) can observe what the pipeline is doing without being coupled to it. Enable it in the config:
bus:
enabled: true # publish session events for subscribers
host: 127.0.0.1
port: 8765
With the bus enabled, paty run starts a local WebSocket server at ws://host:port. Subscribers receive two frame types:
- Text frames — JSON control events with envelope
{v, seq, ts_ms, session_id, type, data}. Types cover session lifecycle (session.started,session.ended), user turn (user.speech_started/stopped,user.transcript.partial/final), agent turn (agent.thinking_started,agent.response.delta/completed,agent.speech_started/stopped), derivedstate.changed(idle/listening/thinking/speaking),metrics.tick,input.muted, anderror/log. - Binary frames — a 16-byte header followed by PCM16LE audio samples. Header:
magic(1),version(1),stream(1: 1=mic, 2=agent),reserved(1),sample_rate(u16 LE),channels(u16 LE),seq(u32 LE),ts_ms(u32 LE)since session start.
The server fans out to any number of subscribers; control events never drop (overflow disconnects the slow subscriber), audio frames drop-oldest under backpressure.
Bus actions
Subscribers can also send JSON commands to the bus to control the agent. Each command is a single JSON object:
{"action": "mute.toggle"}
{"action": "mute.set", "muted": true}
| Action | Payload | Effect |
|---|---|---|
mute.toggle |
— | Flip the mic mute. While muted, mic audio is dropped before reaching STT, so PATY can't hear you. |
mute.set |
muted: bool |
Set the mute to an explicit state. |
Every state change is broadcast back as an input.muted event with {muted: bool} so all subscribers stay in sync.
paty bus tail
Connects to a running bus and pretty-prints events as they arrive. Useful for verifying the bus end-to-end and as a reference implementation for TUI subscribers.
# terminal 1 — run the agent with bus.enabled: true
uv run paty run examples/paty.yaml
# terminal 2 — tail the bus
uv run paty bus tail # defaults to ws://127.0.0.1:8765
uv run paty bus tail --url ws://remote:8765 # different host/port
uv run paty bus tail --no-audio # hide audio frame lines
paty bus tui
Full-screen view of the same stream — transcript on the left, avatar top-right, equalizer bottom-right.
uv run paty bus tui # defaults to ws://127.0.0.1:8765
uv run paty bus tui --url ws://remote:8765
Built on Rich's immediate-mode Live: hold state in memory, rebuild the renderable tree on each event, let the library diff and repaint. Layout carves the terminal into named regions and each widget is a pure (state) -> Renderable function, so swapping a stub for real content is a one-file edit.
paty/tui/
├── __init__.py — exports run
├── app.py — event loop, UIState, repaint
├── conversation.py — Conversation/Turn
├── layout.py — root split tree
└── widgets/
├── __init__.py
├── transcript.py — conversation renderer
├── avatar.py — stub face keyed off agent state
└── equalizer.py — stub bar chart (zero levels for now)
The avatar reacts to state.changed events out of the box (idle/listening/thinking/speaking). The equalizer is a visual stub — wiring it to real levels means subscribing to the bus's binary audio frames (paty.bus.codec.unpack_audio_frame) and computing per-band RMS.
Hardware Profiles
When profile: auto, PATY detects your platform and memory to pick the best profile.
| Profile | STT | LLM | TTS | Memory Budget |
|---|---|---|---|---|
| apple-16gb | distil-whisper-large-v3 | qwen3:8b Q4 | kokoro | ~5.5GB |
| apple-24gb | large-v3-turbo | qwen3:14b Q4 | kokoro | ~9.5GB |
| cuda-24gb | distil-large-v2 | qwen3:14b Q4 | kokoro | ~9.5GB |
| cpu-only | distil-medium-en | qwen3:4b Q4 | piper | ~3GB |
Architecture
PATY is a runtime resolver, not a code generator. It parses YAML, detects hardware, resolves config keys to Pipecat service constructors, builds a live Pipeline, and starts the runner.
YAML config
→ config loader (ruamel.yaml + Pydantic validation)
→ hardware detector (platform, GPU, memory)
→ service resolver (config keys → Pipecat service instances)
→ pipeline builder (services → Pipecat Pipeline)
→ runner (starts Pipecat PipelineRunner)
Every phase is traced via OpenTelemetry. Once the pipeline starts, Pipecat's built-in OTel tracing takes over for per-turn STT/LLM/TTS spans.
Package Structure
paty/
├── cli.py # click CLI commands
├── config/
│ ├── schema.py # Pydantic models
│ └── loader.py # YAML loading + env interpolation
├── tracing/
│ └── setup.py # OpenTelemetry TracerProvider init
├── hardware/
│ ├── detect.py # platform/GPU/memory detection
│ └── profiles.py # named profiles → model defaults
├── resolve/
│ ├── registry.py # (provider, platform) → factory tables
│ └── resolver.py # config + platform → Pipecat services
├── pipeline/
│ └── builder.py # services → Pipeline + PipelineTask
├── bus/
│ ├── events.py # event types + envelope
│ ├── codec.py # binary audio frame pack/unpack
│ ├── server.py # WebSocketBus (fan-out, backpressure)
│ ├── observer.py # Pipecat frame → bus event translator
│ └── tail.py # `paty bus tail` client
├── tui/
│ ├── app.py # `paty bus tui` event loop + UIState
│ ├── conversation.py # Conversation/Turn state
│ ├── layout.py # Rich Layout split tree
│ └── widgets/
│ ├── transcript.py
│ ├── avatar.py
│ └── equalizer.py
└── utils/
└── env.py # ${VAR} interpolation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paty-0.0.3.tar.gz.
File metadata
- Download URL: paty-0.0.3.tar.gz
- Upload date:
- Size: 316.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37814f660607d0b9b4d120124f4549244d2907c7b0e43100b5475556853c45f5
|
|
| MD5 |
bd6da247c6d101d839bda67dbe843602
|
|
| BLAKE2b-256 |
1714c45760e7fbde9d0230711b74d695a679109ff5ea2bb31f032693fc9918e4
|
Provenance
The following attestation bundles were made for paty-0.0.3.tar.gz:
Publisher:
release.yml on PATYai/PATY
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
paty-0.0.3.tar.gz -
Subject digest:
37814f660607d0b9b4d120124f4549244d2907c7b0e43100b5475556853c45f5 - Sigstore transparency entry: 1416376979
- Sigstore integration time:
-
Permalink:
PATYai/PATY@a630f14bf5887372318a81c0f9d8f8afc73c47ee -
Branch / Tag:
refs/tags/v0.0.3 - Owner: https://github.com/PATYai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a630f14bf5887372318a81c0f9d8f8afc73c47ee -
Trigger Event:
push
-
Statement type:
File details
Details for the file paty-0.0.3-py3-none-any.whl.
File metadata
- Download URL: paty-0.0.3-py3-none-any.whl
- Upload date:
- Size: 62.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7e1405145a689b429fdc6f33e18ebb1d3f5d0a38451f3a39fc0fff7e52d8558
|
|
| MD5 |
5f0de2b433b607f43cbf16e66c1bb017
|
|
| BLAKE2b-256 |
d8e843741a4bf78169cde10e79e9d3882edf24a99e8a792124af3c11bef314c6
|
Provenance
The following attestation bundles were made for paty-0.0.3-py3-none-any.whl:
Publisher:
release.yml on PATYai/PATY
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
paty-0.0.3-py3-none-any.whl -
Subject digest:
e7e1405145a689b429fdc6f33e18ebb1d3f5d0a38451f3a39fc0fff7e52d8558 - Sigstore transparency entry: 1416377105
- Sigstore integration time:
-
Permalink:
PATYai/PATY@a630f14bf5887372318a81c0f9d8f8afc73c47ee -
Branch / Tag:
refs/tags/v0.0.3 - Owner: https://github.com/PATYai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a630f14bf5887372318a81c0f9d8f8afc73c47ee -
Trigger Event:
push
-
Statement type: