Skip to main content

Official Python SDK for PyAI — speech-to-text (Hear), text-to-speech (Speak), realtime voice agents (Omni), and call compliance (Trace).

Project description

pyai-sdk (Python SDK)

Official Python SDK for PyAI — the all-in-one voice AI platform: lightning-fast speech-to-text, ultra-realistic text-to-speech, end-to-end realtime voice agents, and automatic call compliance. Zero third-party dependencies (standard library only); Python 3.9+.

PyAI products

  • Hear — Lightning-fast, telephony-native speech-to-text. Whisper-compatible transcription tuned for real phone-call audio, with live streaming partials so your app reacts mid-sentence, plus async batch transcription for big archives. POST /v1/audio/transcriptions
  • Speak — Ultra-realistic text-to-speech that starts speaking in tens of milliseconds. Stream lifelike, expressive voices, choose from 36 studio-quality presets, or clone any voice instantly — for free. POST /v1/audio/speech
  • Omni (flagship) — One API for a complete, end-to-end voice AI agent. A single WebSocket where your agent listens, thinks, and speaks — grounded in your knowledge bases and tools, with human-like turn-taking and instant barge-in — no STT, LLM, or TTS to stitch together yourself. wss://api.pyai.com/v1/omni
  • Trace (flagship) — The compliance API that keeps your AI agents safe. Trace automatically checks every call for HIPAA, TCPA, and PII risks (plus your own brand-voice rules), flags the exact rule broken, redacts sensitive data, and seals each call with a tamper-evident audit trail — so a risky conversation never slips through. GET /v1/trace/interactions
  • Cue — Realtime turn detection + knowledge-grounded context for your own stack. Bring your own LLM and voice; Cue nails the hard part — knowing the instant a speaker finishes and surfacing the right context. wss://api.pyai.com/v1/audio/transcriptions/stream
  • Telephony — Instant managed phone numbers for your voice agents. Provision a US number and route live calls straight into an Omni agent — no carrier contracts, no telephony glue. POST /v1/telephony/numbers

The contract is https://api.pyai.com/openapi.json. This SDK wraps it with typed errors, automatic retries, and realtime URL helpers.

Install

pip install pyai-sdk

Quickstart

import os
from pyai import PyAI, new_idempotency_key

pyai = PyAI(api_key=os.environ["PYAI_API_KEY"])

# Text-to-speech
audio = pyai.audio.speech(input="Hello from PyAI.", voice="stock_sarah_style2")
open("hello.wav", "wb").write(audio)

# Voices
voices = pyai.voices.list(gender="female")

# Async transcription (safe retry with an idempotency key)
job = pyai.transcription_jobs.create(
    audio_url="https://example.com/call.wav",
    diarize=True,
    idempotency_key=new_idempotency_key(),
)
done = pyai.transcription_jobs.get(job["job_id"])

Speak audio formats (incl. telephony G.711)

audio.speech encodes server-side into any of eight formats via response_format, so telephony callers no longer hand-roll a resampler + μ-law encoder — the audio comes back already in the shape you need:

# Twilio/SIP-ready in one param: raw 8 kHz mono μ-law, no client-side DSP.
ulaw = pyai.audio.speech(
    input="Your appointment is confirmed.",
    voice="stock_sarah_style2",
    response_format="g711_ulaw",   # -> audio/basic, forced 8 kHz
)
import base64
media_frame_payload = base64.b64encode(ulaw).decode()  # straight into Twilio
response_format sample rates (Hz) Content-Type
mp3 (default) 8000 / 16000 / 24000 / 48000 audio/mpeg
wav 8000 / 16000 / 24000 / 48000 audio/wav
opus 8000 / 16000 / 24000 / 48000 audio/ogg
aac 8000 / 16000 / 24000 / 48000 audio/aac
flac 8000 / 16000 / 24000 / 48000 audio/flac
pcm (raw int16 LE, no header) 8000 / 16000 / 24000 / 48000 audio/pcm
g711_ulaw 8000 (forced) audio/basic
g711_alaw 8000 (forced) audio/basic

The accepted set is exported as SPEECH_FORMATS / SPEECH_SAMPLE_RATES (and a SpeechFormat Literal for type-checkers). Any other value is a 400 unsupported_format. sample_rate is optional — omit it for the engine's native 24 kHz (g711_* is always 8 kHz); omit response_format for the default mp3. See examples/speak-telephony-formats for the full before/after.

Realtime (Omni)

Keys travel as a WebSocket subprotocol. Use the helpers with your preferred WS library (e.g. websockets):

url = pyai.realtime_url(product="omni", agent_id="agent_123")
subprotocol = pyai.realtime_subprotocol()

import asyncio, websockets

async def main():
    async with websockets.connect(url, subprotocols=[subprotocol]) as ws:
        async for frame in ws:
            print(frame)

asyncio.run(main())

Omni uses the native wss://api.pyai.com/v1/omni surface (the default for product="omni"); product="flow" uses /v1/realtime. The older /v2/omni/chat URL is deprecated but still works.

Streaming speech-to-text (Hear / Cue)

The standard library has no production-grade WebSocket client, so the SDK gives you a URL builder (hear_stream_url) plus the subprotocol helper; pair them with websockets (or websocket-client). The wire protocol: stream binary PCM16/opus frames, send {"type":"commit"} to force-finalize, and read JSON frames of type partial / partial_stable / speech_final / final / error:

import asyncio, json, websockets

url = pyai.hear_stream_url(sample_rate=16000)

async def transcribe(pcm_chunks):
    async with websockets.connect(url, subprotocols=[pyai.realtime_subprotocol()]) as ws:
        async for pcm16 in pcm_chunks:
            await ws.send(pcm16)
        await ws.send(json.dumps({"type": "commit"}))
        async for frame in ws:
            print(json.loads(frame))

asyncio.run(transcribe(mic_source()))

For Cue (turn detection + KB context), send {"type": "config", "grounding": true} as the first text frame after connecting; final/speech_final frames then carry a grounding list of top KB passages.

Sync STT, telephony output, and more APIs

# Synchronous speech-to-text
text = pyai.audio.transcriptions.create(file=open("call.wav", "rb"), language="en")["text"]

# Telephony-ready TTS: raw 8 kHz G.711 for Twilio/SIP, encoded server-side —
# no client-side resampler or μ-law encoder. Just base64 it into a media frame.
ulaw = pyai.audio.speech(input="Hi there", response_format="g711_ulaw")

# Voice clones (Speak)
clone = pyai.clones.create(name="Brand VO", file=open("ref.wav", "rb"))
pyai.clones.delete(clone["id"])

# Managed phone numbers (Telephony)
avail = pyai.telephony.numbers.available(area_code="415")["data"]
num = pyai.telephony.numbers.buy(phone_number=avail[0]["phone_number"], agent_id="agent_123")
pyai.telephony.numbers.assign(num["id"], "agent_123")
pyai.telephony.numbers.release(num["id"])

# Compliance (Trace)
fails = pyai.trace.interactions.list(verdict="FAIL")["data"]
pyai.trace.config.set(agent_id="agent_123", enabled=True)
exposure = pyai.trace.exposure(window_days=30)

# Per-call eval scorecard (timeline + quality metrics). Additive and forward-
# compatible — present once the engine emits them, so reading is always safe
# (call_timeline returns [] until then).
timeline = pyai.trace.call_timeline(fails[0]["id"])              # list[dict] of turns
quality = pyai.trace.interactions.get(fails[0]["id"]).get("quality_metrics")

Reproducible runs (evals)

audio.speech and audio.transcriptions.create take optional seed and temperature for deterministic eval runs. They're forward-compatible — honored once the engine supports them and otherwise ignored — so it's always safe to pass:

pyai.audio.speech(input="Hello", voice="stock_sarah_style2", seed=42, temperature=0)
pyai.audio.transcriptions.create(file=open("call.wav", "rb"), seed=42)

CLI (pyai)

The package installs a pyai command (also python -m pyai). pyai doctor introspects your key/scopes via GET /v1/me (skipped gracefully if the route isn't deployed yet), checks endpoint liveness, runs a Speak→Hear round-trip, and prints remediation hints:

export PYAI_API_KEY=pyai_test_...
pyai doctor
# PASS  key (/v1/me)  — env=test; 3 scope(s): hear:transcribe, voice:synthesize, hear:stream
# PASS  speak→hear round-trip  — synth 45210 bytes → "the quick brown fox…"
# Diagnosis: healthy. Key, endpoint, and a Speak→Hear round-trip all work.

pyai smoke   # lighter: models + voices + speak

Errors

Failures raise PyAIError with a stable code (branch on it, not the message):

from pyai import PyAIError

try:
    pyai.audio.speech(input="hi")
except PyAIError as err:
    if err.code == "credit_exhausted":
        ...  # out of prepaid credit — add credit or use a sandbox key

Common codes: unauthorized, forbidden, credit_exhausted, rate_limit_exceeded, concurrency_limit_exceeded, idempotency_conflict. 429/5xx are retried automatically (honoring Retry-After); tune with PyAI(api_key, max_retries=...).

Develop

python -m unittest discover -s tests -v   # no network; transport injected

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyai_sdk-0.2.0.tar.gz (19.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyai_sdk-0.2.0-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file pyai_sdk-0.2.0.tar.gz.

File metadata

  • Download URL: pyai_sdk-0.2.0.tar.gz
  • Upload date:
  • Size: 19.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyai_sdk-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1659270c91b900cd2fc76e8941b4f17a6c86bcfc4de2924a795a134debe77869
MD5 bda78adb84518cded4e61f494770dc4d
BLAKE2b-256 f23d0570017b02fba8f0b95f2209fd004c4380f3f5d254786737dca813e589dc

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyai_sdk-0.2.0.tar.gz:

Publisher: publish-sdk-pypi.yml on atomsai/pyai-platform-backend

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyai_sdk-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pyai_sdk-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 16.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyai_sdk-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c772b3680fa995140c93933fe93aeb139a978398efd171d47aedd4ff50bf3e3c
MD5 542f74fa0baa7d01b71f577cf156fa0f
BLAKE2b-256 3f40582fc33edaef0d49c1732a92956da84cb99a8d97ef9ed418fe94c9566ba5

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyai_sdk-0.2.0-py3-none-any.whl:

Publisher: publish-sdk-pypi.yml on atomsai/pyai-platform-backend

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page