The realtime communication library for Python
Project description
FastRTC
The Real-Time Communication Library for Python.
Turn any python function into a real-time audio and video stream over WebRTC or WebSockets.
Installation
pip install fastrtc
to use built-in pause detection (see ReplyOnPause), and text to speech (see Text To Speech), install the vad and tts extras:
pip install "fastrtc[vad, tts]"
Key Features
- 🗣️ Automatic Voice Detection and Turn Taking built-in, only worry about the logic for responding to the user.
- 💻 Automatic UI - Use the
.ui.launch()method to launch the webRTC-enabled built-in Gradio UI. - 🔌 Automatic WebRTC Support - Use the
.mount(app)method to mount the stream on a FastAPI app and get a webRTC endpoint for your own frontend! - ⚡️ Websocket Support - Use the
.mount(app)method to mount the stream on a FastAPI app and get a websocket endpoint for your own frontend! - 📞 Automatic Telephone Support - Use the
fastphone()method of the stream to launch the application and get a free temporary phone number! - 🤖 Completely customizable backend - A
Streamcan easily be mounted on a FastAPI app so you can easily extend it to fit your production application. See the Talk To Claude demo for an example of how to serve a custom JS frontend.
Docs
Examples
See the Cookbook for examples of how to use the library.
🗣️👀 Gemini Audio Video ChatStream BOTH your webcam video and audio feeds to Google Gemini. You can also upload images to augment your conversation! |
🗣️ Google Gemini Real Time Voice APITalk to Gemini in real time using Google's voice API. |
🗣️ OpenAI Real Time Voice APITalk to ChatGPT in real time using OpenAI's voice API. |
🤖 Hello ComputerSay computer before asking your question! |
🤖 Llama Code EditorCreate and edit HTML pages with just your voice! Powered by SambaNova systems. |
🗣️ Talk to ClaudeUse the Anthropic and Play.Ht APIs to have an audio conversation with Claude. |
🎵 Whisper TranscriptionHave whisper transcribe your speech in real time! |
📷 Yolov10 Object DetectionRun the Yolov10 model on a user webcam stream in real time! |
🗣️ Kyutai MoshiKyutai's moshi is a novel speech-to-speech model for modeling human conversations. |
🗣️ Hello Llama: Stop Word DetectionA code editor built with Llama 3.3 70b that is triggered by the phrase "Hello Llama". Build a Siri-like coding assistant in 100 lines of code! |
Usage
This is a shortened version of the official usage guide.
.ui.launch(): Launch a built-in UI for easily testing and sharing your stream. Built with Gradio..fastphone(): Get a free temporary phone number to call into your stream. Hugging Face token required..mount(app): Mount the stream on a FastAPI app. Perfect for integrating with your already existing production system.
Quickstart
Echo Audio
from fastrtc import Stream, ReplyOnPause
import numpy as np
def echo(audio: tuple[int, np.ndarray]):
# The function will be passed the audio until the user pauses
# Implement any iterator that yields audio
# See "LLM Voice Chat" for a more complete example
yield audio
stream = Stream(
handler=ReplyOnPause(echo),
modality="audio",
mode="send-receive",
)
LLM Voice Chat
from fastrtc import (
ReplyOnPause, AdditionalOutputs, Stream,
audio_to_bytes, aggregate_bytes_to_16bit
)
import gradio as gr
from groq import Groq
import anthropic
from elevenlabs import ElevenLabs
groq_client = Groq()
claude_client = anthropic.Anthropic()
tts_client = ElevenLabs()
# See "Talk to Claude" in Cookbook for an example of how to keep
# track of the chat history.
def response(
audio: tuple[int, np.ndarray],
):
prompt = groq_client.audio.transcriptions.create(
file=("audio-file.mp3", audio_to_bytes(audio)),
model="whisper-large-v3-turbo",
response_format="verbose_json",
).text
response = claude_client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=512,
messages=[{"role": "user", "content": prompt}],
)
response_text = " ".join(
block.text
for block in response.content
if getattr(block, "type", None) == "text"
)
iterator = tts_client.text_to_speech.convert_as_stream(
text=response_text,
voice_id="JBFqnCBsd6RMkjVDRZzb",
model_id="eleven_multilingual_v2",
output_format="pcm_24000"
)
for chunk in aggregate_bytes_to_16bit(iterator):
audio_array = np.frombuffer(chunk, dtype=np.int16).reshape(1, -1)
yield (24000, audio_array)
stream = Stream(
modality="audio",
mode="send-receive",
handler=ReplyOnPause(response),
)
Webcam Stream
from fastrtc import Stream
import numpy as np
def flip_vertically(image):
return np.flip(image, axis=0)
stream = Stream(
handler=flip_vertically,
modality="video",
mode="send-receive",
)
Object Detection
from fastrtc import Stream
import gradio as gr
import cv2
from huggingface_hub import hf_hub_download
from .inference import YOLOv10
model_file = hf_hub_download(
repo_id="onnx-community/yolov10n", filename="onnx/model.onnx"
)
# git clone https://huggingface.co/spaces/fastrtc/object-detection
# for YOLOv10 implementation
model = YOLOv10(model_file)
def detection(image, conf_threshold=0.3):
image = cv2.resize(image, (model.input_width, model.input_height))
new_image = model.detect_objects(image, conf_threshold)
return cv2.resize(new_image, (500, 500))
stream = Stream(
handler=detection,
modality="video",
mode="send-receive",
additional_inputs=[
gr.Slider(minimum=0, maximum=1, step=0.01, value=0.3)
]
)
Running the Stream
Run:
Gradio
stream.ui.launch()
Telephone (Audio Only)
```py
stream.fastphone()
```
FastAPI
app = FastAPI()
stream.mount(app)
# Optional: Add routes
@app.get("/")
async def _():
return HTMLResponse(content=open("index.html").read())
# uvicorn app:app --host 0.0.0.0 --port 8000
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastrtc-0.0.34-py3-none-any.whl.
File metadata
- Download URL: fastrtc-0.0.34-py3-none-any.whl
- Upload date:
- Size: 2.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6da3831da9c2257810b6c5e5fd7cdd4f1636d716f47ab49d8bb3f2d5d9e499ac
|
|
| MD5 |
4a172d89923edaad4d52a86bf6e079ee
|
|
| BLAKE2b-256 |
afe1ea3968b064a183d9ac595dcbde0e6c67033afe01852d5854a0001f7c9faa
|