Create a live avatar using Gemini and FastAPI

Project description

Gemini Live Avatar

Gemini Live Avatar is an open-source web application that aims to provide a conversational, real-time interface using voice, text, and animated avatars in the browser. While the user interface runs entirely in the browser, it depends on a backend server to handle WebSocket communication and interact with the Gemini Live API. Under the hood, the Gemini Live API enables seamless, low-latency interactions—allowing the 3D avatar to listen, speak, and react in real time, making conversations with AI feel more natural and engaging.

✨ Features

⚡ Real-time interaction powered by the Gemini Live API
🎤 Speech-to-Text User can interact with the avatar using voice input
🗣️ Text-to-Speech for the avatar's spoken responses, including lipsync and facial animations
💬 Text prompting with Gemini’s streaming multimodal responses
🧠 Avatar animation using Ready Player Me and Talking Head
🎥 Webcam and screen sharing capabilities for real-time context
📄 Multimodal chat log displaying user prompts and Gemini responses

🧠 How It Works

Gemini Live Avatar provides an interactive loop where the avatar listens, sees, responds, and reasons in real time:

User speaks, types, or shares screen/camera input.
The server receives, processes, and analyzes input streams using the Gemini Live API, which handles the request with full multimodal context—including what the avatar "sees" through shared screen or camera input.
The avatar responds instantly as Gemini-generated responses are received by the frontend and drive the avatar’s animation:
- Text responses are displayed in the chat log in real time.
- Speech responses are synthesized and played back, with the avatar lip-syncing and animating to match the spoken content.
Function calling is triggered dynamically when needed:
- 🔍 Google Search grounding enhances answers with fresh external information.
- ⚙️ Some Custom tools has been implemented, such as Turn the green(any color) lights on, turn off light, to demonstrate how function calling can be integrated into the system.
Screen and camera content can be referenced directly in user queries like:
- "What’s in this slide?"
- "Can you summarize the text on screen?"
- "Tell me what’s in front of the camera."

This real-time loop enables expressive, grounded, and multimodal conversations with an avatar interface.

Roadmap

End-to-end Gemini Live API integration
Speech-to-Text & Text-to-Speech functionality
Text input with streaming responses including multimodal content
Webcam and screen sharing for real-time context
Avatar animation with Ready Player Me
Avatar animation with Mixamo
Function calling by providing the MCP server URL
Integrate Gemini native audio output support
Integrate with ADK
Add interruption support for real-time responses

Prerequisites

Node.js v18 or later
A Google AI Studio project with a Gemini API key
Python 3.11+
(Optional) Ready Player Me avatar URL

Installation

Development Setup

git clone https://github.com/haruiz/gemini-live-avatar.git
cd gemini-live-avatar
uv sync

PIP installation

pip install gemini-live-avatar

Run the App

gemini-live-avatar --google-search-grounding --workers 1 --avatar-path https://models.readyplayer.me/<AvatarID>.glb

Then open your browser at: http://localhost:8080

🧠 Using Ready Player Me

This project integrates avatars from Ready Player Me, which offers fully rigged, customizable 3D characters ideal for expressive visual representation. Facial movements—including lip sync, eye tracking, and gestures—are animated in real time using the open-source Talking Head library by Mika Suominen, and are driven by responses from the Gemini Live API. Users can personalize the experience by supplying their own Ready Player Me avatar URL.

📦 Built With

Gemini Live API
Vite – Modern dev environment
Ready Player Me – Avatar creation platform
Three.js – 3D rendering engine

🤝 Contributing

Contributions, suggestions, and pull requests are very welcome! If you'd like to contribute, please open an issue or submit a PR.

Project details

Release history Release notifications | RSS feed

This version

0.1.6

Jun 1, 2025

0.1.5

Jun 1, 2025

0.1.4

May 29, 2025

0.1.3

May 29, 2025

0.1.2

May 29, 2025

0.1.1

May 29, 2025

0.1.0

May 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gemini_live_avatar-0.1.6.tar.gz (5.8 MB view details)

Uploaded Jun 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gemini_live_avatar-0.1.6-py3-none-any.whl (5.8 MB view details)

Uploaded Jun 1, 2025 Python 3

File details

Details for the file gemini_live_avatar-0.1.6.tar.gz.

File metadata

Download URL: gemini_live_avatar-0.1.6.tar.gz
Upload date: Jun 1, 2025
Size: 5.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.2

File hashes

Hashes for gemini_live_avatar-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`398f88e54682960ac5e39142a8444d9a67c8894d749dd13d1d6eeff281111764`
MD5	`20e3de612cb6e19495149598d6ac8c9f`
BLAKE2b-256	`cf16792e280430793c06897e4b815798154731717ce89b3e0da1318585f04f3c`

See more details on using hashes here.

File details

Details for the file gemini_live_avatar-0.1.6-py3-none-any.whl.

File metadata

Download URL: gemini_live_avatar-0.1.6-py3-none-any.whl
Upload date: Jun 1, 2025
Size: 5.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.2

File hashes

Hashes for gemini_live_avatar-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c262e781ccbeb2da587a85c08c882a6039cfe3a1b927ed1d6790727f74fd9423`
MD5	`c47a6fd0e2b7e1ad553112ba1609d4dc`
BLAKE2b-256	`8e759b55e60235d109fdc52a99130d1868c34d17c65956ea6adb4b0a3739c4b8`

See more details on using hashes here.

gemini-live-avatar 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Gemini Live Avatar

✨ Features

🧠 How It Works

Roadmap

Prerequisites

Installation

Development Setup

PIP installation

Run the App

🧠 Using Ready Player Me

📦 Built With

🤝 Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes