An open source framework for voice (and multimodal) assistants
Project description
Pipecat
pipecat
is a framework for building voice (and multimodal) conversational agents. Things like personal coaches, meeting assistants, story-telling toys for kids, customer support bots, intake flows, and snarky social companions.
Take a look at some example apps:
Getting started with voice agents
You can get started with Pipecat running on your local machine, then move your agent processes to the cloud when you’re ready. You can also add a 📞 telephone number, 🖼️ image output, 📺 video input, use different LLMs, and more.
# install the module
pip install pipecat-ai
# set up an .env file with API keys
cp dot-env.template .env
By default, in order to minimize dependencies, only the basic framework functionality is available. Some third-party AI services require additional dependencies that you can install with:
pip install "pipecat-ai[option,...]"
Your project may or may not need these, so they're made available as optional requirements. Here is a list:
- AI services:
anthropic
,assemblyai
,aws
,azure
,deepgram
,gladia
,google
,fal
,lmnt
,moondream
,openai
,openpipe
,playht
,silero
,whisper
,xtts
- Transports:
local
,websocket
,daily
Code examples
- foundational — small snippets that build on each other, introducing one or two concepts at a time
- example apps — complete applications that you can use as starting points for development
A simple voice agent running locally
Here is a very basic Pipecat bot that greets a user when they join a real-time session. We'll use Daily for real-time media transport, and Cartesia for text-to-speech.
import asyncio
from pipecat.frames.frames import EndFrame, TextFrame
from pipecat.pipeline.pipeline import Pipeline
from pipecat.pipeline.task import PipelineTask
from pipecat.pipeline.runner import PipelineRunner
from pipecat.services.cartesia import CartesiaTTSService
from pipecat.transports.services.daily import DailyParams, DailyTransport
async def main():
# Use Daily as a real-time media transport (WebRTC)
transport = DailyTransport(
room_url=...,
token="", # leave empty. Note: token is _not_ your api key
bot_name="Bot Name",
params=DailyParams(audio_out_enabled=True))
# Use Cartesia for Text-to-Speech
tts = CartesiaTTSService(
api_key=...,
voice_id=...
)
# Simple pipeline that will process text to speech and output the result
pipeline = Pipeline([tts, transport.output()])
# Create Pipecat processor that can run one or more pipelines tasks
runner = PipelineRunner()
# Assign the task callable to run the pipeline
task = PipelineTask(pipeline)
# Register an event handler to play audio when a
# participant joins the transport WebRTC session
@transport.event_handler("on_first_participant_joined")
async def on_first_participant_joined(transport, participant):
participant_name = participant.get("info", {}).get("userName", "")
# Queue a TextFrame that will get spoken by the TTS service (Cartesia)
await task.queue_frame(TextFrame(f"Hello there, {participant_name}!"))
# Register an event handler to exit the application when the user leaves.
@transport.event_handler("on_participant_left")
async def on_participant_left(transport, participant, reason):
await task.queue_frame(EndFrame())
# Run the pipeline task
await runner.run(task)
if __name__ == "__main__":
asyncio.run(main())
Run it with:
python app.py
Daily provides a prebuilt WebRTC user interface. Whilst the app is running, you can visit at https://<yourdomain>.daily.co/<room_url>
and listen to the bot say hello!
WebRTC for production use
WebSockets are fine for server-to-server communication or for initial development. But for production use, you’ll need client-server audio to use a protocol designed for real-time media transport. (For an explanation of the difference between WebSockets and WebRTC, see this post.)
One way to get up and running quickly with WebRTC is to sign up for a Daily developer account. Daily gives you SDKs and global infrastructure for audio (and video) routing. Every account gets 10,000 audio/video/transcription minutes free each month.
Sign up here and create a room in the developer Dashboard.
What is VAD?
Voice Activity Detection — very important for knowing when a user has finished speaking to your bot. If you are not using press-to-talk, and want Pipecat to detect when the user has finished talking, VAD is an essential component for a natural feeling conversation.
Pipecat makes use of WebRTC VAD by default when using a WebRTC transport layer. Optionally, you can use Silero VAD for improved accuracy at the cost of higher CPU usage.
pip install pipecat-ai[silero]
Running the Krisp Audio Filter
To use the Krisp Filter in this project, you’ll need access to the Krisp C++ SDK.
Step 1: Obtain Access to the Krisp SDK
- Create a Krisp Account: If you don’t already have an account, sign up at Krisp to access the SDK.
- Download the SDK: Once you have an account, follow the instructions on the Krisp platform to download the Krisp's desktop SDKs.
- Export the path to you krisp SDK:
export KRISP_SDK_PATH=/PATH/TO/KRISP/SDK
Step 2: Install the pipecat-krisp
Module
Once the environment variable KRISP_SDK_PATH
is exported, activate your Python virtual environment and install it with pip
:
source venv/bin/activate
pip install pipecat-ai[krisp]
Hacking on the framework itself
Note that you may need to set up a virtual environment before following the instructions below. For instance, you might need to run the following from the root of the repo:
python3 -m venv venv
source venv/bin/activate
From the root of this repo, run the following:
pip install -r dev-requirements.txt
python -m build
This builds the package. To use the package locally (e.g. to run sample files), run
pip install --editable ".[option,...]"
If you want to use this package from another directory, you can run:
pip install "path_to_this_repo[option,...]"
Running tests
From the root directory, run:
pytest --doctest-modules --ignore-glob="*to_be_updated*" --ignore-glob=*pipeline_source* src tests
Setting up your editor
This project uses strict PEP 8 formatting via Ruff.
Emacs
You can use use-package to install emacs-lazy-ruff package and configure ruff
arguments:
(use-package lazy-ruff
:ensure t
:hook ((python-mode . lazy-ruff-mode))
:config
(setq lazy-ruff-format-command "ruff format")
(setq lazy-ruff-only-format-block t)
(setq lazy-ruff-only-format-region t)
(setq lazy-ruff-only-format-buffer t))
ruff
was installed in the venv
environment described before, so you should be able to use pyvenv-auto to automatically load that environment inside Emacs.
(use-package pyvenv-auto
:ensure t
:defer t
:hook ((python-mode . pyvenv-auto-run)))
Visual Studio Code
Install the
Ruff extension. Then edit the user settings (Ctrl-Shift-P Open User Settings (JSON)
) and set it as the default Python formatter, and enable formatting on save:
"[python]": {
"editor.defaultFormatter": "charliermarsh.ruff",
"editor.formatOnSave": true
}
Getting help
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pipecat_ai-0.0.48.tar.gz
.
File metadata
- Download URL: pipecat_ai-0.0.48.tar.gz
- Upload date:
- Size: 66.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 96b4a4f78f7b1b7b0ee649cdc9b61f4912b094ca6c3dce71617a90ad83984e79 |
|
MD5 | 01dcf57d1c397c5df82c4a16af782a89 |
|
BLAKE2b-256 | 8fb95da91b2e97cf0175786108977ba4121b76abbe27a24ff2f4fca54868a31c |
Provenance
The following attestation bundles were made for pipecat_ai-0.0.48.tar.gz
:
Publisher:
publish.yaml
on pipecat-ai/pipecat
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
pipecat_ai-0.0.48.tar.gz
- Subject digest:
96b4a4f78f7b1b7b0ee649cdc9b61f4912b094ca6c3dce71617a90ad83984e79
- Sigstore transparency entry: 147989226
- Sigstore integration time:
- Predicate type:
File details
Details for the file pipecat_ai-0.0.48-py3-none-any.whl
.
File metadata
- Download URL: pipecat_ai-0.0.48-py3-none-any.whl
- Upload date:
- Size: 2.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ee427289c95dce7d53c6040a8d9f1dcb4e69adc488f683c40b0b27f32e3a433 |
|
MD5 | 56a90bbb84bdbf70edee356b9535475c |
|
BLAKE2b-256 | 564216cd7c30a101115e95c75ef2bdf7f42d4eeeb1eed3ca00208318b32f6d8c |
Provenance
The following attestation bundles were made for pipecat_ai-0.0.48-py3-none-any.whl
:
Publisher:
publish.yaml
on pipecat-ai/pipecat
-
Statement type:
https://in-toto.io/Statement/v1
- Predicate type:
https://docs.pypi.org/attestations/publish/v1
- Subject name:
pipecat_ai-0.0.48-py3-none-any.whl
- Subject digest:
9ee427289c95dce7d53c6040a8d9f1dcb4e69adc488f683c40b0b27f32e3a433
- Sigstore transparency entry: 147989227
- Sigstore integration time:
- Predicate type: