Skip to main content

PATY — Declarative voice agent deployment on Pipecat

Project description

PATY — Please & Thank You

Declarative voice agent deployment on Pipecat. uv tool install paty && paty run and you're talking to a voice agent. No bot.py to write, no YAML required.

Quickstart

curl -LsSf https://astral.sh/uv/install.sh | sh   # if you don't already have uv
uv tool install paty
paty run

paty run with no argument loads a bundled default config (friendly paty persona, auto-detected hardware profile). The first paty run will detect your platform and tell you exactly which extra to install for local inference:

uv tool install 'paty[mlx]'   # Apple Silicon
uv tool install 'paty[cuda]'  # NVIDIA GPU
uv tool install 'paty[cpu]'   # CPU fallback

Then paty run again. On first launch PATY will:

  1. Pick a hardware profile from your platform and memory.
  2. Download the LLM weights from Hugging Face (a few GB — first start is slow, subsequent runs hit the cache).
  3. Download the Whisper STT model on first use.
  4. Start the managed LLM server, warm it up, then open a local mic/speaker transport so you can talk to the agent.

Press Ctrl+C to stop.

To run a config of your own:

paty run path/to/your-config.yaml

Platform notes

  • Apple Silicon (macOS arm64): the [mlx] extra pulls in MLX. No system toolchain required.
  • NVIDIA GPU (CUDA): the [cuda] extra installs llama-cpp-python with GPU offload, which needs a working CUDA toolchain at install time. See the llama-cpp-python CUDA build docs.
  • CPU-only: the [cpu] extra needs a C/C++ toolchain (build-essential on Linux, Xcode Command Line Tools on macOS) for llama-cpp-python.

External services

  • LLM — PATY spawns a managed inference server automatically (mlx_lm.server on Apple Silicon, llama_cpp.server on CUDA/CPU). No separate Ollama install is required; models are pulled from Hugging Face on first run.
  • TTS on CUDA/CPU — the kokoro provider expects an OpenAI-compatible Kokoro FastAPI server at http://localhost:8880/v1. The easiest way is the Docker image from remsky/Kokoro-FastAPI. Apple Silicon runs Kokoro in-process via mlx-audio and needs nothing extra.
  • Piper (CPU alternative)tts: piper downloads its voice model on first use; no server needed.

Contributing / dev install

git clone https://github.com/PATYai/PATY.git
cd PATY/cli
uv sync --extra mlx --extra dev          # or --extra cuda / --extra cpu
uv run pytest tests/ -v                  # run tests
uv run ruff check paty/ tests/           # lint
uv run ruff format --check paty/ tests/  # format check

Config

The YAML config is PATY's primary interface. A minimal example:

pak:
  persona: "You are a receptionist for Dr. Smith's dental office."

pipeline:
  stt: whisper
  llm: ollama
  tts: kokoro
  vad: silero

hardware:
  profile: auto    # or: apple-16gb, apple-24gb, cuda-24gb, cpu-only

sip:
  provider: voip-ms
  host: sip.voip.ms
  username: "100000"
  password: "${SIP_PASSWORD}"
  did: "+13035551234"

tracing:
  enabled: true
  console: true

Pipeline entries accept string shorthand (stt: whisper) or expanded form:

pipeline:
  stt:
    provider: whisper
    model: large-v3-turbo
  llm:
    provider: ollama
    model: qwen3:14b
    base_url: http://localhost:11434/v1
  tts:
    provider: kokoro
    voice: af_bella
    base_url: http://localhost:8880/v1

Environment variables in ${VAR} syntax are interpolated at load time.

CLI Commands

paty run [config.yaml]       Start the voice agent (no arg → bundled default)
paty bus tail                Subscribe to a running bus and print events
paty bus tui                 Live conversation view subscribed to the bus
paty profiles                List hardware profiles and their model selections
paty pak list                List installed PAKs
paty pak active              Print the currently active PAK
paty pak validate <path>     Validate a PAK directory
paty pak switch <name>       Set the active PAK (applies on next `paty run`)
paty init                    Scaffold a starter config (coming soon)
paty doctor                  Check dependencies (coming soon)
paty eject <config.yaml>     Generate standalone bot.py (coming soon)

PAKs (Personality Augmentation Kits)

A PAK bundles a persona (system prompt) and voice settings (TTS provider/voice, optional LLM pin) into a self-contained directory. PATY ships a default paty PAK; additional PAKs can be installed under ~/.paty/paks/<name>/. Each PAK directory contains:

pak.yaml      # manifest: name, version, voice config
soul.md       # the system prompt / persona document

A PAK-style paty.yaml:

pak:
  active: paty           # name of an installed PAK; bundled default is "paty"
hardware:
  profile: auto

For an ad-hoc persona without a PAK directory, set pak.persona instead of pak.active (the two are mutually exclusive). A transient PAK is synthesized from the inline text and routed through the same voice-resolution pipeline as a registered PAK. If neither field is set, the bundled paty PAK is loaded automatically.

User-provided pipeline.tts.voice or pipeline.llm.model override what the PAK declares — useful for debugging or forcing every PAK onto a single voice.

PAKs may pin voice.llm.model to a specific LLM. This is allowed but expensive — switching to or from a differently-pinned PAK forces a full LLM reload. PATY logs a loud warning at startup when a pin disagrees with the resolved hardware profile.

Note: hot-swap is not yet implemented. paty pak switch <name> updates the active pointer; the change applies on the next paty run. A follow-up will land in-process swap (TTS replaced live, LLM warmed up where compatible).

Event Bus

PATY can publish session events over a WebSocket so other processes (e.g. a TUI) can observe what the pipeline is doing without being coupled to it. Enable it in the config:

bus:
  enabled: true            # publish session events for subscribers
  host: 127.0.0.1
  port: 8765

With the bus enabled, paty run starts a local WebSocket server at ws://host:port. Subscribers receive two frame types:

  • Text frames — JSON control events with envelope {v, seq, ts_ms, session_id, type, data}. Types cover session lifecycle (session.started, session.ended), user turn (user.speech_started/stopped, user.transcript.partial/final), agent turn (agent.thinking_started, agent.response.delta/completed, agent.speech_started/stopped), derived state.changed (idle/listening/thinking/speaking), metrics.tick, input.muted, and error/log.
  • Binary frames — a 16-byte header followed by PCM16LE audio samples. Header: magic(1), version(1), stream(1: 1=mic, 2=agent), reserved(1), sample_rate(u16 LE), channels(u16 LE), seq(u32 LE), ts_ms(u32 LE) since session start.

The server fans out to any number of subscribers; control events never drop (overflow disconnects the slow subscriber), audio frames drop-oldest under backpressure.

Bus actions

Subscribers can also send JSON commands to the bus to control the agent. Each command is a single JSON object:

{"action": "mute.toggle"}
{"action": "mute.set", "muted": true}
Action Payload Effect
mute.toggle Flip the mic mute. While muted, mic audio is dropped before reaching STT, so PATY can't hear you.
mute.set muted: bool Set the mute to an explicit state.

Every state change is broadcast back as an input.muted event with {muted: bool} so all subscribers stay in sync.

paty bus tail

Connects to a running bus and pretty-prints events as they arrive. Useful for verifying the bus end-to-end and as a reference implementation for TUI subscribers.

# terminal 1 — run the agent (the bundled default has bus.enabled: true)
paty run

# terminal 2 — tail the bus
paty bus tail                           # defaults to ws://127.0.0.1:8765
paty bus tail --url ws://remote:8765    # different host/port
paty bus tail --no-audio                # hide audio frame lines

paty bus tui

Full-screen view of the same stream — transcript on the left, avatar top-right, equalizer bottom-right.

paty bus tui                            # defaults to ws://127.0.0.1:8765
paty bus tui --url ws://remote:8765

Built on Rich's immediate-mode Live: hold state in memory, rebuild the renderable tree on each event, let the library diff and repaint. Layout carves the terminal into named regions and each widget is a pure (state) -> Renderable function, so swapping a stub for real content is a one-file edit.

paty/tui/
├── __init__.py            — exports run
├── app.py                 — event loop, UIState, repaint
├── conversation.py        — Conversation/Turn
├── layout.py              — root split tree
└── widgets/
    ├── __init__.py
    ├── transcript.py      — conversation renderer
    ├── avatar.py          — stub face keyed off agent state
    └── equalizer.py       — stub bar chart (zero levels for now)

The avatar reacts to state.changed events out of the box (idle/listening/thinking/speaking). The equalizer is a visual stub — wiring it to real levels means subscribing to the bus's binary audio frames (paty.bus.codec.unpack_audio_frame) and computing per-band RMS.

Hardware Profiles

When profile: auto, PATY detects your platform and memory to pick the best profile.

Profile STT LLM TTS Memory Budget
apple-16gb distil-whisper-large-v3 qwen3:8b Q4 kokoro ~5.5GB
apple-24gb large-v3-turbo qwen3:14b Q4 kokoro ~9.5GB
cuda-24gb distil-large-v2 qwen3:14b Q4 kokoro ~9.5GB
cpu-only distil-medium-en qwen3:4b Q4 piper ~3GB

Architecture

PATY is a runtime resolver, not a code generator. It parses YAML, detects hardware, resolves config keys to Pipecat service constructors, builds a live Pipeline, and starts the runner.

YAML config
  → config loader (ruamel.yaml + Pydantic validation)
  → hardware detector (platform, GPU, memory)
  → service resolver (config keys → Pipecat service instances)
  → pipeline builder (services → Pipecat Pipeline)
  → runner (starts Pipecat PipelineRunner)

Every phase is traced via OpenTelemetry. Once the pipeline starts, Pipecat's built-in OTel tracing takes over for per-turn STT/LLM/TTS spans.

Package Structure

paty/
├── cli.py                 # click CLI commands
├── config/
│   ├── schema.py          # Pydantic models
│   └── loader.py          # YAML loading + env interpolation
├── tracing/
│   └── setup.py           # OpenTelemetry TracerProvider init
├── hardware/
│   ├── detect.py          # platform/GPU/memory detection
│   └── profiles.py        # named profiles → model defaults
├── resolve/
│   ├── registry.py        # (provider, platform) → factory tables
│   └── resolver.py        # config + platform → Pipecat services
├── pipeline/
│   └── builder.py         # services → Pipeline + PipelineTask
├── bus/
│   ├── events.py          # event types + envelope
│   ├── codec.py           # binary audio frame pack/unpack
│   ├── server.py          # WebSocketBus (fan-out, backpressure)
│   ├── observer.py        # Pipecat frame → bus event translator
│   └── tail.py            # `paty bus tail` client
├── tui/
│   ├── app.py             # `paty bus tui` event loop + UIState
│   ├── conversation.py    # Conversation/Turn state
│   ├── layout.py          # Rich Layout split tree
│   └── widgets/
│       ├── transcript.py
│       ├── avatar.py
│       └── equalizer.py
└── utils/
    └── env.py             # ${VAR} interpolation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paty-0.0.7.tar.gz (317.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paty-0.0.7-py3-none-any.whl (64.9 kB view details)

Uploaded Python 3

File details

Details for the file paty-0.0.7.tar.gz.

File metadata

  • Download URL: paty-0.0.7.tar.gz
  • Upload date:
  • Size: 317.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for paty-0.0.7.tar.gz
Algorithm Hash digest
SHA256 d570dde7c8de91b95e846de91c7f6efc10c1b9b7e131a0e7f0d19dc9167c13fa
MD5 a6140cdf169e28fd19cfc32754bb6d7d
BLAKE2b-256 84bb49df9814ac086cbcfdd5a2bfd97049c1fabab6aded41390a2dde5a338ec7

See more details on using hashes here.

Provenance

The following attestation bundles were made for paty-0.0.7.tar.gz:

Publisher: release.yml on PATYai/PATY

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file paty-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: paty-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 64.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for paty-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 03eb0f8be640f975150bde2fb50747472776937fac8b28a8974e3a163acaafcf
MD5 1b14744741c07d62395ea7da5b76a03c
BLAKE2b-256 f4011cfb957e378f26ebf1057ff53e149d1b199b1e595ea0ec99a80cec39f3f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for paty-0.0.7-py3-none-any.whl:

Publisher: release.yml on PATYai/PATY

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page