Open-LLM-VTuber capabilities (hands-free voice, barge-in, Live2D avatar) as a hermes-agent plugin.
Project description
omnilimb-face
English · 中文
A standalone, installable hermes-agent plugin that gives your agent a face and a voice: hands-free voice interaction (VAD + STT), real-time barge-in, and a Live2D / Live3D avatar with lip-sync and expression driving — all without modifying any hermes core file. Part of the omnilimb family.
Demo
The avatar renders on top and speaks the agent's reply with lip-sync and expressions; you type (or talk) in the dialog box below.
The animation above is a synthetic preview (
python preview.py); a still frame is indocs/assets/screenshot.png. Avatar: Live2D Cubism sample "Mao" © Live2D Inc. (loaded from CDN, not bundled — see Credits).
📖 Full docs:
docs/GUIDE.md— overview / architecture / install / config / commands (CLI · slash · tools) / Live2D ↔ Live3D switching / real-time barge-in / troubleshooting / development.🎭 Avatar deep-dive:
docs/AVATAR_INTEGRATION.md— how Live2D/Live3D expressions, motions and lip-sync are bound, how to import your own model (Cubism / VRM), and the 2D + 3D fusion roadmap.
How it works
The plugin reuses hermes' existing systems instead of carrying its own model
config: transcripts are injected via ctx.inject_message, the host agent's
normal turn produces the reply (using the user's active model, tools and
memory), and the reply text is intercepted through the transform_llm_output /
post_llm_call hooks to drive TTS and the avatar. Speech transcription reuses
the stt config section; speech synthesis reuses the tts section. The plugin
never calls an LLM itself — the avatar always speaks your configured agent's
real answer.
Getting started
Requires Python 3.11+. Pick one of the two paths below — each is step by step.
Option A — 1-minute preview (no hermes-agent needed)
# 1) install EVERYTHING in one command (incl. the Edge-TTS voice, so you HEAR it)
pip install -e ".[all]"
# 2) start the preview (it serves the web page AND the gateway)
python preview.py
# 3) open the PAGE in your browser:
# http://127.0.0.1:12394/ <-- the web page
Type in the page and the avatar replies in Chinese voice with lip-sync + expressions. (Click or type once first — browsers only allow audio after a user gesture.)
⚠️ The page is on port 12394, not 12393. Port 12393 is the WebSocket gateway (
ws://…/client-ws); openinghttp://127.0.0.1:12393/in a browser will NOT work. The page lives ongateway port + 1= 12394. (In single-port mode —python preview.py --single-port/--https— the page and gateway share 12393.)
Option B — full product (chat in hermes, the avatar speaks the real replies)
# 1) install into the hermes venv, everything included, in ONE command
<hermes-venv>/python -m pip install -e "path/to/omnilimb-face[all]"
# 2) enable it: add `omnilimb-face` to plugins.enabled in ~/.hermes/config.yaml
# 3) run hermes — it starts the gateway (12393) + front-end (12394)
hermes
hermes vtuber status # check the avatar subsystem came up
# 4) open http://127.0.0.1:12394/ , then chat in the hermes terminal —
# the avatar speaks each reply with lip-sync + expressions.
Speech uses the host text_to_speech tool when present, else the keyless Edge-TTS
fallback (Chinese voice by default; set tts.<provider>.voice to an Edge
*Neural voice to change it). The /client-ws gateway works with websockets
12.x and 13–15.x.
Smaller installs (optional)
[all] pulls everything for a working setup. If you want a smaller footprint,
install only the pieces you need — the core install runs in a degraded state
without them (it still registers and its tools stay visible in hermes tools):
pip install -e ".[voice]" # microphone capture + VAD (hands-free)
pip install -e ".[wakeword]" # optional wake-word activation
pip install -e ".[live2d]" # front-end static serving
pip install -e ".[preview]" # Edge-TTS voice + local STT for `preview.py`
pip install -e ".[dev]" # test tooling
Enabling in hermes
Discovered via the hermes_agent.plugins pip entry point, or by placing the
directory at ~/AppData/Local/hermes/plugins/omnilimb-face/. Enable it by adding
omnilimb-face to plugins.enabled in config.yaml.
Mobile (phone) support
The avatar and hands-free voice also run on a phone. Because browsers only allow microphone access from a secure context, the preview can serve over HTTPS with a self-signed certificate on your LAN:
python preview.py --lan --https # HTTPS on your LAN IP (single port 12393)
# or use start.bat options 3 / 4 (LAN HTTPS, optionally with --llm --stt)
Then open https://<your-LAN-IP>:12393/ on the phone (same Wi-Fi) and accept the
self-signed certificate warning once — after that the mobile mic works. Cert
generation needs the [preview] extra (cryptography).
Troubleshooting
http://127.0.0.1:12393/won't open. That's the WebSocket gateway, not a web page. Openhttp://127.0.0.1:12394/(gateway port + 1).- No sound. Install the voice engine:
pip install edge-tts(orpip install -e ".[all]"). The preview printsvoice: on (edge-tts …)when it is active. You also need internet (Edge online voices) and to click/type once in the page first (browser autoplay policy). - I installed it but nothing happens / no avatar. Installing only adds the
code — you still have to START it:
python preview.py(Option A) orhermes vtuber start(Option B), then openhttp://127.0.0.1:12394/. - Avatar doesn't appear. The Live2D model loads from CDN — check your internet. Offline, it falls back to a dependency-free canvas placeholder.
Layout
omnilimb-face/
├── pyproject.toml # packaging, pinned deps, optional extras, entry point
├── plugin.yaml # PluginManifest (name/version/hooks/tools)
├── __init__.py # directory-discovery shim -> re-exports register
├── omnilimb_face/ # plugin package
│ ├── plugin.py # register(ctx) entry point
│ ├── voice/ # capture, VAD, wake-word
│ └── protocol/ # /client-ws event models + gateway
└── tests/ # pytest + Hypothesis (unit / property / integration)
Development
pytest # run the test suite
HYPOTHESIS_PROFILE=ci pytest # heavier property-test run
License
Licensed under the GNU Affero General Public License v3.0 or later
(AGPL-3.0-or-later) — see LICENSE.
In short: you are free to use, study, modify and share this software, including for commercial purposes, but if you distribute it or run a modified version as a network service, you must release your complete corresponding source under the AGPL as well. This keeps every downstream version open.
Commercial / proprietary license available. If you want to use omnilimb-face
in a closed-source or proprietary product without the AGPL's source-disclosure
obligations, a separate commercial license can be purchased — see
COMMERCIAL-LICENSE.md or contact
yase19636404@163.com.
Copyright © 2025 seanyang1983.
Credits & third-party licensing
This plugin does not bundle or redistribute any avatar models, the Live2D
Cubism Core, or any third-party front-end runtime. Everything below is loaded
from CDN at runtime only (with a dependency-free canvas fallback when offline).
Full details in NOTICE.md and THIRD_PARTY_NOTICES.md.
-
Open-LLM-VTuber — the
/client-wsprotocol here is an independent re-implementation compatible with Open-LLM-VTuber (MIT through v1.2.0 as of 2026-06; seeNOTICE.mdfor the license-transition note). No upstream source is copied. Thanks to the project for the protocol design. -
Live2D Cubism sample model — the default avatar (
models/model_dict.json) references Live2D Inc.'s official Cubism sample "Mao" (a commit-pinned CDN URL only). The Cubism Core runtime is proprietary to Live2D Inc. and is loaded from Live2D's CDN, never bundled.This content uses sample data owned and copyrighted by Live2D Inc. (the "Terms of Use for Live2D Cubism Sample Data" / Live2D Cubism SDK license — https://www.live2d.com/en/).
Live2D's official samples are free for general users and small-scale enterprises (latest annual sales under 10,000,000 JPY); larger entities are subject to additional Live2D terms. For commercial / large-scale use, swap in a model you own or are licensed to use by editing
models/model_dict.json. -
pixi.js / pixi-live2d-display / three.js / @pixiv/three-vrm — all MIT, loaded from CDN.
-
Optional Python deps —
edge-tts([preview], GPL-3.0, never bundled) andopenWakeWord([wakeword], Apache-2.0 library but its pre-trained models are CC-BY-NC-SA-4.0 / NonCommercial) need attention for commercial use.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file omnilimb_face-0.1.0.tar.gz.
File metadata
- Download URL: omnilimb_face-0.1.0.tar.gz
- Upload date:
- Size: 296.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8eef1f176e9f40f215dafefe66ca50e722453c8d29629791c40081f3c1fcfab
|
|
| MD5 |
68bb2f56664337c8f42a1dff5d05b06f
|
|
| BLAKE2b-256 |
6e8a5fd733d801ab3daf9250e98ff9e5fad549579d81188382b6894ff39dccdd
|
File details
Details for the file omnilimb_face-0.1.0-py3-none-any.whl.
File metadata
- Download URL: omnilimb_face-0.1.0-py3-none-any.whl
- Upload date:
- Size: 224.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75c2d02e7ebe60685f42d526ed2bae99c28dd95477674c2f26177bac94aad985
|
|
| MD5 |
a4ad82677c9608e6af48b9eb1e088abf
|
|
| BLAKE2b-256 |
f32508ebddecf98ee8429070918dc2940fec788f909cdfddef50e4acdaacd702
|