Skip to main content

An open source framework for voice (and multimodal) assistants

Project description

 pipecat

PyPI Discord

Pipecat is an open source Python framework for building voice and multimodal conversational agents. It handles the complex orchestration of AI services, network transport, audio processing, and multimodal interactions, letting you focus on creating engaging experiences.

What you can build

See it in action

 
 

Key features

  • Voice-first Design: Built-in speech recognition, TTS, and conversation handling
  • Flexible Integration: Works with popular AI services (OpenAI, ElevenLabs, etc.)
  • Pipeline Architecture: Build complex apps from simple, reusable components
  • Real-time Processing: Frame-based pipeline architecture for fluid interactions
  • Production Ready: Enterprise-grade WebRTC and Websocket support

💡 Looking to build structured conversations? Check out Pipecat Flows for managing complex conversational states and transitions.

Getting started

You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when you’re ready. You can also add a 📞 telephone number, 🖼️ image output, 📺 video input, use different LLMs, and more.

# Install the module
pip install pipecat-ai

# Set up your environment
cp dot-env.template .env

To keep things lightweight, only the core framework is included by default. If you need support for third-party AI services, you can add the necessary dependencies with:

pip install "pipecat-ai[option,...]"

Available options include:

Category Services Install Command Example
Speech-to-Text AssemblyAI, Azure, Deepgram, Gladia, Whisper pip install "pipecat-ai[deepgram]"
LLMs Anthropic, Azure, Fireworks AI, Gemini, Grok, Groq, NVIDIA NIM, Ollama, OpenAI, Together AI pip install "pipecat-ai[openai]"
Text-to-Speech AWS, Azure, Cartesia, Deepgram, ElevenLabs, Google, LMNT, OpenAI, PlayHT, Rime, XTTS pip install "pipecat-ai[cartesia]"
Speech-to-Speech Gemini Multimodal Live, OpenAI Realtime pip install "pipecat-ai[openai]"
Transport Daily (WebRTC), WebSocket, Local pip install "pipecat-ai[daily]"
Video Tavus, Simli pip install "pipecat-ai[tavus,simli]"
Vision & Image Moondream, fal pip install "pipecat-ai[moondream]"
Audio Processing Silero VAD, Krisp, Noisereduce pip install "pipecat-ai[silero]"
Analytics & Metrics Canonical AI, Sentry pip install "pipecat-ai[canonical]"

📚 View full services documentation →

Code examples

  • Foundational — small snippets that build on each other, introducing one or two concepts at a time
  • Example apps — complete applications that you can use as starting points for development

A simple voice agent running locally

Here is a very basic Pipecat bot that greets a user when they join a real-time session. We'll use Daily for real-time media transport, and Cartesia for text-to-speech.

import asyncio

from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport

async def main():
  # Use Daily as a real-time media transport (WebRTC)
  transport = DailyTransport(
    room_url=...,
    token="", # leave empty. Note: token is _not_ your api key
    bot_name="Bot Name",
    params=DailyParams(audio_out_enabled=True))

  # Use Cartesia for Text-to-Speech
  tts = CartesiaTTSService(
    api_key=...,
    voice_id=...
  )

  # Simple pipeline that will process text to speech and output the result
  pipeline = Pipeline([tts, transport.output()])

  # Create Pipecat processor that can run one or more pipelines tasks
  runner = PipelineRunner()

  # Assign the task callable to run the pipeline
  task = PipelineTask(pipeline)

  # Register an event handler to play audio when a
  # participant joins the transport WebRTC session
  @transport.event_handler("on_first_participant_joined")
  async def on_first_participant_joined(transport, participant):
    participant_name = participant.get("info", {}).get("userName", "")
    # Queue a TextFrame that will get spoken by the TTS service (Cartesia)
    await task.queue_frame(TextFrame(f"Hello there, {participant_name}!"))

  # Register an event handler to exit the application when the user leaves.
  @transport.event_handler("on_participant_left")
  async def on_participant_left(transport, participant, reason):
    await task.queue_frame(EndFrame())

  # Run the pipeline task
  await runner.run(task)

if __name__ == "__main__":
  asyncio.run(main())

Run it with:

python app.py

Daily provides a prebuilt WebRTC user interface. While the app is running, you can visit at https://<yourdomain>.daily.co/<room_url> and listen to the bot say hello!

WebRTC for production use

WebSockets are fine for server-to-server communication or for initial development. But for production use, you’ll need client-server audio to use a protocol designed for real-time media transport. (For an explanation of the difference between WebSockets and WebRTC, see this post.)

One way to get up and running quickly with WebRTC is to sign up for a Daily developer account. Daily gives you SDKs and global infrastructure for audio (and video) routing. Every account gets 10,000 audio/video/transcription minutes free each month.

Sign up here and create a room in the developer Dashboard.

Hacking on the framework itself

Note that you may need to set up a virtual environment before following the instructions below. For instance, you might need to run the following from the root of the repo:

python3 -m venv venv
source venv/bin/activate

From the root of this repo, run the following:

pip install -r dev-requirements.txt
python -m build

This builds the package. To use the package locally (e.g. to run sample files), run

pip install --editable ".[option,...]"

If you want to use this package from another directory, you can run:

pip install "path_to_this_repo[option,...]"

Running tests

From the root directory, run:

pytest --doctest-modules --ignore-glob="*to_be_updated*" --ignore-glob=*pipeline_source* src tests

Setting up your editor

This project uses strict PEP 8 formatting via Ruff.

Emacs

You can use use-package to install emacs-lazy-ruff package and configure ruff arguments:

(use-package lazy-ruff
  :ensure t
  :hook ((python-mode . lazy-ruff-mode))
  :config
  (setq lazy-ruff-format-command "ruff format")
  (setq lazy-ruff-only-format-block t)
  (setq lazy-ruff-only-format-region t)
  (setq lazy-ruff-only-format-buffer t))

ruff was installed in the venv environment described before, so you should be able to use pyvenv-auto to automatically load that environment inside Emacs.

(use-package pyvenv-auto
  :ensure t
  :defer t
  :hook ((python-mode . pyvenv-auto-run)))

Visual Studio Code

Install the Ruff extension. Then edit the user settings (Ctrl-Shift-P Open User Settings (JSON)) and set it as the default Python formatter, and enable formatting on save:

"[python]": {
    "editor.defaultFormatter": "charliermarsh.ruff",
    "editor.formatOnSave": true
}

Contributing

We welcome contributions from the community! Whether you're fixing bugs, improving documentation, or adding new features, here's how you can help:

  • Found a bug? Open an issue
  • Have a feature idea? Start a discussion
  • Want to contribute code? Check our CONTRIBUTING.md guide
  • Documentation improvements? Docs PRs are always welcome

Before submitting a pull request, please check existing issues and PRs to avoid duplicates.

We aim to review all contributions promptly and provide constructive feedback to help get your changes merged.

Getting help

➡️ Join our Discord

➡️ Read the docs

➡️ Reach us on X

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipecat_ai-0.0.51.tar.gz (66.5 MB view details)

Uploaded Source

Built Distribution

pipecat_ai-0.0.51-py3-none-any.whl (2.2 MB view details)

Uploaded Python 3

File details

Details for the file pipecat_ai-0.0.51.tar.gz.

File metadata

  • Download URL: pipecat_ai-0.0.51.tar.gz
  • Upload date:
  • Size: 66.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for pipecat_ai-0.0.51.tar.gz
Algorithm Hash digest
SHA256 d23653adb9de7a23af6187f1a54013ec84d29327e2082b9f09d93a66601e6f21
MD5 58cc4c6a15041b168eb83500218e7214
BLAKE2b-256 a8f184da1b8ae333a15d7a4394d2a5d955909eedeb158d7ba21dcfcd8798ce4a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pipecat_ai-0.0.51.tar.gz:

Publisher: publish.yaml on pipecat-ai/pipecat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pipecat_ai-0.0.51-py3-none-any.whl.

File metadata

  • Download URL: pipecat_ai-0.0.51-py3-none-any.whl
  • Upload date:
  • Size: 2.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for pipecat_ai-0.0.51-py3-none-any.whl
Algorithm Hash digest
SHA256 824e12de850dd977f9249112062f3da527d3c0f91a1e8ba5c4c4382e533257ab
MD5 8c98cfb9a3f74cd08afbd63831f3803a
BLAKE2b-256 e40db80c7b3dc96255efc341255b982685bca77b5c891e1326f7d12c4440c63f

See more details on using hashes here.

Provenance

The following attestation bundles were made for pipecat_ai-0.0.51-py3-none-any.whl:

Publisher: publish.yaml on pipecat-ai/pipecat

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page