Official Python SDK for PyAI — speech-to-text (Hear), text-to-speech (Speak), realtime voice agents (Omni), and call compliance (Trace).
Project description
pyai-sdk (Python SDK)
Official Python SDK for PyAI — the all-in-one voice AI platform: lightning-fast speech-to-text, ultra-realistic text-to-speech, end-to-end realtime voice agents, and automatic call compliance. Zero third-party dependencies (standard library only); Python 3.9+.
PyAI products
- Hear — Lightning-fast, telephony-native speech-to-text. Whisper-compatible transcription tuned for real phone-call audio, with live streaming partials so your app reacts mid-sentence, plus async batch transcription for big archives.
POST /v1/audio/transcriptions - Speak — Ultra-realistic text-to-speech that starts speaking in tens of milliseconds. Stream lifelike, expressive voices, choose from 36 studio-quality presets, or clone any voice instantly — for free.
POST /v1/audio/speech - Omni (flagship) — One API for a complete, end-to-end voice AI agent. A single WebSocket where your agent listens, thinks, and speaks — grounded in your knowledge bases and tools, with human-like turn-taking and instant barge-in — no STT, LLM, or TTS to stitch together yourself.
wss://api.pyai.com/v1/omni - Trace (flagship) — The compliance API that keeps your AI agents safe. Trace automatically checks every call for HIPAA, TCPA, and PII risks (plus your own brand-voice rules), flags the exact rule broken, redacts sensitive data, and seals each call with a tamper-evident audit trail — so a risky conversation never slips through.
GET /v1/trace/interactions - Cue — Realtime turn detection + knowledge-grounded context for your own stack. Bring your own LLM and voice; Cue nails the hard part — knowing the instant a speaker finishes and surfacing the right context.
wss://api.pyai.com/v1/audio/transcriptions/stream - Telephony — Instant managed phone numbers for your voice agents. Provision a US number and route live calls straight into an Omni agent — no carrier contracts, no telephony glue.
POST /v1/telephony/numbers
The contract is https://api.pyai.com/openapi.json. This SDK wraps it with
typed errors, automatic retries, and realtime URL helpers.
Install
pip install pyai-sdk
Quickstart
import os
from pyai import PyAI, new_idempotency_key
pyai = PyAI(api_key=os.environ["PYAI_API_KEY"])
# Text-to-speech
audio = pyai.audio.speech(input="Hello from PyAI.", voice="stock_sarah_style2")
open("hello.wav", "wb").write(audio)
# Voices
voices = pyai.voices.list(gender="female")
# Async transcription (safe retry with an idempotency key)
job = pyai.transcription_jobs.create(
audio_url="https://example.com/call.wav",
diarize=True,
idempotency_key=new_idempotency_key(),
)
done = pyai.transcription_jobs.get(job["job_id"])
Speak audio formats (incl. telephony G.711)
audio.speech encodes server-side into any of eight formats via response_format,
so telephony callers no longer hand-roll a resampler + μ-law encoder — the audio
comes back already in the shape you need:
# Twilio/SIP-ready in one param: raw 8 kHz mono μ-law, no client-side DSP.
ulaw = pyai.audio.speech(
input="Your appointment is confirmed.",
voice="stock_sarah_style2",
response_format="g711_ulaw", # -> audio/basic, forced 8 kHz
)
import base64
media_frame_payload = base64.b64encode(ulaw).decode() # straight into Twilio
response_format |
sample rates (Hz) | Content-Type |
|---|---|---|
mp3 (default) |
8000 / 16000 / 24000 / 48000 | audio/mpeg |
wav |
8000 / 16000 / 24000 / 48000 | audio/wav |
opus |
8000 / 16000 / 24000 / 48000 | audio/ogg |
aac |
8000 / 16000 / 24000 / 48000 | audio/aac |
flac |
8000 / 16000 / 24000 / 48000 | audio/flac |
pcm (raw int16 LE, no header) |
8000 / 16000 / 24000 / 48000 | audio/pcm |
g711_ulaw |
8000 (forced) | audio/basic |
g711_alaw |
8000 (forced) | audio/basic |
The accepted set is exported as SPEECH_FORMATS / SPEECH_SAMPLE_RATES (and a
SpeechFormat Literal for type-checkers). Any other value is a
400 unsupported_format. sample_rate is optional — omit it for the engine's
native 24 kHz (g711_* is always 8 kHz); omit response_format for the default
mp3. See
examples/speak-telephony-formats for
the full before/after.
Realtime (Omni)
Keys travel as a WebSocket subprotocol. Use the helpers with your preferred WS
library (e.g. websockets):
url = pyai.realtime_url(product="omni", agent_id="agent_123")
subprotocol = pyai.realtime_subprotocol()
import asyncio, websockets
async def main():
async with websockets.connect(url, subprotocols=[subprotocol]) as ws:
async for frame in ws:
print(frame)
asyncio.run(main())
Omni uses the native
wss://api.pyai.com/v1/omnisurface (the default forproduct="omni");product="flow"uses/v1/realtime. The older/v2/omni/chatURL is deprecated but still works.
Streaming speech-to-text (Hear / Cue)
The standard library has no production-grade WebSocket client, so the SDK gives
you a URL builder (hear_stream_url) plus the subprotocol helper; pair them with
websockets (or websocket-client). The wire protocol: stream binary PCM16/opus
frames, send {"type":"commit"} to force-finalize, and read JSON frames of type
partial / partial_stable / speech_final / final / error:
import asyncio, json, websockets
url = pyai.hear_stream_url(sample_rate=16000)
async def transcribe(pcm_chunks):
async with websockets.connect(url, subprotocols=[pyai.realtime_subprotocol()]) as ws:
async for pcm16 in pcm_chunks:
await ws.send(pcm16)
await ws.send(json.dumps({"type": "commit"}))
async for frame in ws:
print(json.loads(frame))
asyncio.run(transcribe(mic_source()))
For Cue (turn detection + KB context), send {"type": "config", "grounding": true}
as the first text frame after connecting; final/speech_final frames then carry
a grounding list of top KB passages.
Sync STT, telephony output, and more APIs
# Synchronous speech-to-text
text = pyai.audio.transcriptions.create(file=open("call.wav", "rb"), language="en")["text"]
# Telephony-ready TTS: raw 8 kHz G.711 for Twilio/SIP, encoded server-side —
# no client-side resampler or μ-law encoder. Just base64 it into a media frame.
ulaw = pyai.audio.speech(input="Hi there", response_format="g711_ulaw")
# Voice clones (Speak)
clone = pyai.clones.create(name="Brand VO", file=open("ref.wav", "rb"))
pyai.clones.delete(clone["id"])
# Managed phone numbers (Telephony)
avail = pyai.telephony.numbers.available(area_code="415")["data"]
num = pyai.telephony.numbers.buy(phone_number=avail[0]["phone_number"], agent_id="agent_123")
pyai.telephony.numbers.assign(num["id"], "agent_123")
pyai.telephony.numbers.release(num["id"])
# Compliance (Trace)
fails = pyai.trace.interactions.list(verdict="FAIL")["data"]
pyai.trace.config.set(agent_id="agent_123", enabled=True)
exposure = pyai.trace.exposure(window_days=30)
# Per-call eval scorecard (timeline + quality metrics). Additive and forward-
# compatible — present once the engine emits them, so reading is always safe
# (call_timeline returns [] until then).
timeline = pyai.trace.call_timeline(fails[0]["id"]) # list[dict] of turns
quality = pyai.trace.interactions.get(fails[0]["id"]).get("quality_metrics")
Reproducible runs (evals)
audio.speech and audio.transcriptions.create take optional seed and
temperature for deterministic eval runs. They're forward-compatible — honored
once the engine supports them and otherwise ignored — so it's always safe to pass:
pyai.audio.speech(input="Hello", voice="stock_sarah_style2", seed=42, temperature=0)
pyai.audio.transcriptions.create(file=open("call.wav", "rb"), seed=42)
CLI (pyai)
The package installs a pyai command (also python -m pyai). pyai doctor
introspects your key/scopes via GET /v1/me (skipped gracefully if the route
isn't deployed yet), checks endpoint liveness, runs a Speak→Hear round-trip, and
prints remediation hints:
export PYAI_API_KEY=pyai_test_...
pyai doctor
# PASS key (/v1/me) — env=test; 3 scope(s): hear:transcribe, voice:synthesize, hear:stream
# PASS speak→hear round-trip — synth 45210 bytes → "the quick brown fox…"
# Diagnosis: healthy. Key, endpoint, and a Speak→Hear round-trip all work.
pyai smoke # lighter: models + voices + speak
Errors
Failures raise PyAIError with a stable code (branch on it, not the message):
from pyai import PyAIError
try:
pyai.audio.speech(input="hi")
except PyAIError as err:
if err.code == "credit_exhausted":
... # out of prepaid credit — add credit or use a sandbox key
Common codes: unauthorized, forbidden, credit_exhausted,
rate_limit_exceeded, concurrency_limit_exceeded, idempotency_conflict.
429/5xx are retried automatically (honoring Retry-After); tune with
PyAI(api_key, max_retries=...).
Develop
python -m unittest discover -s tests -v # no network; transport injected
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyai_sdk-0.2.0.tar.gz.
File metadata
- Download URL: pyai_sdk-0.2.0.tar.gz
- Upload date:
- Size: 19.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1659270c91b900cd2fc76e8941b4f17a6c86bcfc4de2924a795a134debe77869
|
|
| MD5 |
bda78adb84518cded4e61f494770dc4d
|
|
| BLAKE2b-256 |
f23d0570017b02fba8f0b95f2209fd004c4380f3f5d254786737dca813e589dc
|
Provenance
The following attestation bundles were made for pyai_sdk-0.2.0.tar.gz:
Publisher:
publish-sdk-pypi.yml on atomsai/pyai-platform-backend
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyai_sdk-0.2.0.tar.gz -
Subject digest:
1659270c91b900cd2fc76e8941b4f17a6c86bcfc4de2924a795a134debe77869 - Sigstore transparency entry: 1851133880
- Sigstore integration time:
-
Permalink:
atomsai/pyai-platform-backend@165fac2b5bca55aa97ed6c82ec9ec7b73c0ffffd -
Branch / Tag:
refs/tags/sdk-py-v0.2.0 - Owner: https://github.com/atomsai
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-sdk-pypi.yml@165fac2b5bca55aa97ed6c82ec9ec7b73c0ffffd -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyai_sdk-0.2.0-py3-none-any.whl.
File metadata
- Download URL: pyai_sdk-0.2.0-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c772b3680fa995140c93933fe93aeb139a978398efd171d47aedd4ff50bf3e3c
|
|
| MD5 |
542f74fa0baa7d01b71f577cf156fa0f
|
|
| BLAKE2b-256 |
3f40582fc33edaef0d49c1732a92956da84cb99a8d97ef9ed418fe94c9566ba5
|
Provenance
The following attestation bundles were made for pyai_sdk-0.2.0-py3-none-any.whl:
Publisher:
publish-sdk-pypi.yml on atomsai/pyai-platform-backend
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyai_sdk-0.2.0-py3-none-any.whl -
Subject digest:
c772b3680fa995140c93933fe93aeb139a978398efd171d47aedd4ff50bf3e3c - Sigstore transparency entry: 1851133977
- Sigstore integration time:
-
Permalink:
atomsai/pyai-platform-backend@165fac2b5bca55aa97ed6c82ec9ec7b73c0ffffd -
Branch / Tag:
refs/tags/sdk-py-v0.2.0 - Owner: https://github.com/atomsai
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-sdk-pypi.yml@165fac2b5bca55aa97ed6c82ec9ec7b73c0ffffd -
Trigger Event:
push
-
Statement type: