Skip to main content

Stream images in realtime with webrtc

Project description

Gradio WebRTC ⚡️

Static Badge Static Badge

Stream video and audio in real time with Gradio using WebRTC.

Installation

pip install gradio_webrtc

to use built-in pause detection (see conversational ai), install the vad extra:

pip install gradio_webrtc[vad]

Examples:

  1. Object Detection from Webcam with YOLOv10 📷
  2. Streaming Object Detection from Video with RT-DETR 🎥
  3. Text-to-Speech 🗣️
  4. Conversational AI 🤖🗣️

Usage

The WebRTC component supports the following three use cases:

  1. Streaming video from the user webcam to the server and back
  2. Streaming Video from the server to the client
  3. Streaming Audio from the server to the client
  4. Streaming Audio from the client to the server and back (conversational AI)

Streaming Video from the User Webcam to the Server and Back

import gradio as gr
from gradio_webrtc import WebRTC


def detection(image, conf_threshold=0.3):
    ... your detection code here ...


with gr.Blocks() as demo:
    image = WebRTC(label="Stream", mode="send-receive", modality="video")
    conf_threshold = gr.Slider(
        label="Confidence Threshold",
        minimum=0.0,
        maximum=1.0,
        step=0.05,
        value=0.30,
    )
    image.stream(
        fn=detection,
        inputs=[image, conf_threshold],
        outputs=[image], time_limit=10
    )

if __name__ == "__main__":
    demo.launch()
  • Set the mode parameter to send-receive and modality to "video".
  • The stream event's fn parameter is a function that receives the next frame from the webcam as a numpy array and returns the processed frame also as a numpy array.
  • Numpy arrays are in (height, width, 3) format where the color channels are in RGB format.
  • The inputs parameter should be a list where the first element is the WebRTC component. The only output allowed is the WebRTC component.
  • The time_limit parameter is the maximum time in seconds the video stream will run. If the time limit is reached, the video stream will stop.

Streaming Video from the server to the client

import gradio as gr
from gradio_webrtc import WebRTC
import cv2

def generation():
    url = "https://download.tsi.telecom-paristech.fr/gpac/dataset/dash/uhd/mux_sources/hevcds_720p30_2M.mp4"
    cap = cv2.VideoCapture(url)
    iterating = True
    while iterating:
        iterating, frame = cap.read()
        yield frame

with gr.Blocks() as demo:
    output_video = WebRTC(label="Video Stream", mode="receive", modality="video")
    button = gr.Button("Start", variant="primary")
    output_video.stream(
        fn=generation, inputs=None, outputs=[output_video],
        trigger=button.click
    )

if __name__ == "__main__":
    demo.launch()
  • Set the "mode" parameter to "receive" and "modality" to "video".
  • The stream event's fn parameter is a generator function that yields the next frame from the video as a numpy array.
  • The only output allowed is the WebRTC component.
  • The trigger parameter the gradio event that will trigger the webrtc connection. In this case, the button click event.

Streaming Audio from the Server to the Client

import gradio as gr
from pydub import AudioSegment

def generation(num_steps):
    for _ in range(num_steps):
        segment = AudioSegment.from_file("/Users/freddy/sources/gradio/demo/audio_debugger/cantina.wav")
        yield (segment.frame_rate, np.array(segment.get_array_of_samples()).reshape(1, -1))

with gr.Blocks() as demo:
    audio = WebRTC(label="Stream", mode="receive", modality="audio")
    num_steps = gr.Slider(
        label="Number of Steps",
        minimum=1,
        maximum=10,
        step=1,
        value=5,
    )
    button = gr.Button("Generate")

    audio.stream(
        fn=generation, inputs=[num_steps], outputs=[audio],
        trigger=button.click
    )
  • Set the "mode" parameter to "receive" and "modality" to "audio".
  • The stream event's fn parameter is a generator function that yields the next audio segment as a tuple of (frame_rate, audio_samples).
  • The numpy array should be of shape (1, num_samples).
  • The outputs parameter should be a list with the WebRTC component as the only element.

Conversational AI

import gradio as gr
import numpy as np
from gradio_webrtc import WebRTC, StreamHandler
from queue import Queue
import time


class EchoHandler(StreamHandler):
    def __init__(self) -> None:
        super().__init__()
        self.queue = Queue()

    def receive(self, frame: tuple[int, np.ndarray] | np.ndarray) -> None:
        self.queue.put(frame)

    def emit(self) -> None:
        return self.queue.get()
    
    def copy(self) -> StreamHandler:
        return EchoHandler()


with gr.Blocks() as demo:
    with gr.Column():
        with gr.Group():
            audio = WebRTC(
                label="Stream",
                rtc_configuration=None,
                mode="send-receive",
                modality="audio",
            )

        audio.stream(fn=EchoHandler(), inputs=[audio], outputs=[audio], time_limit=15)


if __name__ == "__main__":
    demo.launch()
  • Instead of passing a function to the stream event's fn parameter, pass a StreamHandler implementation. The StreamHandler above simply echoes the audio back to the client.
  • The StreamHandler class has two methods: receive and emit and copy. The receive method is called when a new frame is received from the client, and the emit method returns the next frame to send to the client. The copy method is called at the beginning of the stream to ensure each user has a unique stream handler.
  • An audio frame is represented as a tuple of (frame_rate, audio_samples) where audio_samples is a numpy array of shape (num_channels, num_samples).
  • You can also specify the audio layout ("mono" or "stereo") in the emit method by retuning it as the third element of the tuple. If not specified, the default is "mono".
  • The time_limit parameter is the maximum time in seconds the conversation will run. If the time limit is reached, the audio stream will stop.
  • The emit method SHOULD NOT block. If a frame is not ready to be sent, the method should return None.

An easy way to get started with Conversational AI is to use the ReplyOnPause stream handler. This will automatically run your function when the speaker has stopped speaking. In order to use ReplyOnPause, the [vad] extra dependencies must be installed.

import gradio as gr
from gradio_webrtc import WebRTC, ReplyOnPause

def response(audio: tuple[int, np.ndarray]):
    """This function must yield audio frames"""
    ...
    for numpy_array in generated_audio:
        yield (sampling_rate, numpy_array, "mono")


with gr.Blocks() as demo:
    gr.HTML(
    """
    <h1 style='text-align: center'>
    Chat (Powered by WebRTC ⚡️)
    </h1>
    """
    )
    with gr.Column():
        with gr.Group():
            audio = WebRTC(
                label="Stream",
                rtc_configuration=rtc_configuration,
                mode="send-receive",
                modality="audio",
            )
        audio.stream(fn=ReplyOnPause(response), inputs=[audio], outputs=[audio], time_limit=60)


demo.launch(ssr_mode=False)

Deployment

When deploying in a cloud environment (like Hugging Face Spaces, EC2, etc), you need to set up a TURN server to relay the WebRTC traffic. The easiest way to do this is to use a service like Twilio.

from twilio.rest import Client
import os

account_sid = os.environ.get("TWILIO_ACCOUNT_SID")
auth_token = os.environ.get("TWILIO_AUTH_TOKEN")

client = Client(account_sid, auth_token)

token = client.tokens.create()

rtc_configuration = {
    "iceServers": token.ice_servers,
    "iceTransportPolicy": "relay",
}

with gr.Blocks() as demo:
    ...
    rtc = WebRTC(rtc_configuration=rtc_configuration, ...)
    ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gradio_webrtc-0.0.13.tar.gz (2.0 MB view details)

Uploaded Source

Built Distribution

gradio_webrtc-0.0.13-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file gradio_webrtc-0.0.13.tar.gz.

File metadata

  • Download URL: gradio_webrtc-0.0.13.tar.gz
  • Upload date:
  • Size: 2.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for gradio_webrtc-0.0.13.tar.gz
Algorithm Hash digest
SHA256 b7f8ae77b5c38183644c3a7016b5414b506c9e2d62f5a9eefce5a2387cc5843a
MD5 d417bcc4cee43014f0aff2f3394c6d91
BLAKE2b-256 9fdab1c9768155a08855d37bdbdc5a0a607c22f52a9441c034176aeb159b78f1

See more details on using hashes here.

File details

Details for the file gradio_webrtc-0.0.13-py3-none-any.whl.

File metadata

File hashes

Hashes for gradio_webrtc-0.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 8c59e1393e76bb13dadca0a1f4d7caf2cbee50c8c143fe6e615064105f541a16
MD5 df4dcdd7f9cb1e02e6cb166d6e17a9d0
BLAKE2b-256 b5bd7c607cb5d50a1c2d953dd03493c2fc8a0610be52620e1c181af1330794e8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page