Collection of Zrb additional utilities
Project description
Zrb extras
zrb-extras is a pypi package.
You can install zrb-extras by invoking the following command:
pip install zrb-extras
Let your LLMTask speak and listen
Prerequisites
Termux
First of all, make sure termux has permission to access microphone/speaker
pkg update && pkg upgrade -y
pkg install pulseaudio termux-api -y
Run the following script or add it to ~/.bashrc
# start PulseAudio daemon
pulseaudio --start --load="module-native-protocol-tcp auth-ip-acl=127.0.0.1 auth-anonymous=1" --exit-idle-time=-1
# load module now (if it errors, check you gave Termux:API mic permission and restart Termux)
pactl load-module module-sles-source
# confirm source exists
pactl list short sources
# Start proot-distro
proot-distro login ubuntu
Proot-distro (Ubuntu)
apt install libasound2-dev portaudio19-dev pulseaudio
Create zrb_init.py
import os
from zrb.builtin import llm_ask
from zrb import llm_config
from zrb_extras.llm.tool import create_listen_tool, create_speak_tool
# Valid modes: "google", "openai", "termux", "vosk"
VOICE_MODE = os.getenv("VOICE_MODE", "vosk")
if VOICE_MODE not in ("google", "openai", "termux", "vosk"):
VOICE_MODE = "vosk"
llm_ask.add_tool(
create_speak_tool(
mode=VOICE_MODE,
genai_tts_model="gemini-2.5-flash-preview-tts", # Optional
genai_voice_name="Sulafat", # Optional
openai_tts_model="tts-1", # Optional
openai_voice_name="alloy", # Optional
sample_rate_out=24000, # Optional
)
)
llm_ask.add_tool(
create_listen_tool(
mode=VOICE_MODE,
genai_stt_model="gemini-2.5-flash", # Optional
openai_stt_model="whisper-1", # Optional
sample_rate=16000, # Optional
channels=1, # Optional
silence_threshold=0.01, # Optional
max_silence=4.0, # Optional
# Sound Classification (optional)
use_sound_classifier=True, # Enable sound classification
classification_model=None, # Use default small model
classification_system_prompt="Classify if the transcript contains actual speech or just background noise/fillers",
classification_retries=2, # Retry classification on failure
fail_safe=True, # Default to handling as speech if classification fails
)
)
# Optional: allow LLM to speak or listen without asking for user approval
if not llm_config.default_yolo_mode:
llm_config.set_default_yolo_mode(["speak", "listen"])
## Sound Classification Feature
The `create_listen_tool` now includes an optional sound classification feature that uses an LLM to analyze transcripts and determine if they contain actual speech or just background noise, fillers, or non-speech sounds.
### Key Features:
1. **VAD is always used** for initial speech detection (already implemented in existing listen tools)
2. **When `use_sound_classifier=True`**, transcripts are classified by an LLM using zrb's small model configuration system
3. **Fail-safe default**: If the classifier fails, it assumes the sound should be handled as speech
4. **Structured output**: Uses structured output types similar to `../zrb/src/zrb/task/llm/history_processor.py` pattern
5. **Configurable**: Supports custom models, prompts, retries, and rate limiting
### Usage Examples:
```python
# Basic usage with sound classification
listen_tool = create_listen_tool(
mode="vosk",
use_sound_classifier=True,
tool_name="smart_listen"
)
# With custom classification settings
listen_tool = create_listen_tool(
mode="google",
use_sound_classifier=True,
classification_model="custom-model",
classification_model_settings={"temperature": 0.1},
classification_system_prompt="Classify speech vs noise",
classification_retries=3,
fail_safe=False, # Raise exception on classification failure
rate_limitter=my_rate_limiter,
tool_name="custom_classifier_listen"
)
# Backward compatibility - old code still works
listen_tool = create_listen_tool(
mode="termux",
# No use_sound_classifier parameter
tool_name="basic_listen"
)
How It Works:
- The underlying listen tool (Vosk, Google, OpenAI, or Termux) captures audio and transcribes it
- VAD (Voice Activity Detection) filters out silent periods
- If
use_sound_classifier=True, the transcript is sent to an LLM classifier - The classifier returns a structured response indicating:
is_speech: Boolean indicating if it's actual speechconfidence: Confidence score (0.0 to 1.0)category: Optional category (e.g., "speech", "noise", "filler")
- Based on the classification:
- If
is_speech=True: Returns the transcript - If
is_speech=False: Returns empty string (ignores non-speech)
- If
Benefits:
- Reduces false positives: Filters out background noise, coughs, throat clearing, etc.
- Improves accuracy: Only processes actual speech content
- Configurable: Can be tuned for different environments and use cases
- Backward compatible: Existing code continues to work without changes
# For maintainers
## Publish to pypi
To publish zrb-extras, you need to have a `Pypi` account:
- Log in or register to [https://pypi.org/](https://pypi.org/)
- Create an API token
You can also create a `TestPypi` account:
- Log in or register to [https://test.pypi.org/](https://test.pypi.org/)
- Create an API token
Once you have your API token, you need to configure poetry:
poetry config pypi-token.pypi
To publish zrb-extras, you can do the following command:
```bash
poetry publish --build
Updating version
You can update zrb-extras version by modifying the following section in pyproject.toml:
[project]
version = "0.0.2"
Adding dependencies
To add zrb-extras dependencies, you can edit the following section in pyproject.toml:
[project]
dependencies = [
"Jinja2==3.1.2",
"jsons==1.6.3"
]
Adding script
To make zrb-extras executable, you can edit the following section in pyproject.toml:
[project-scripts]
zrb-extras-hello = "zrb_extras.__main__:hello"
Now, whenever you run zrb-extras-hello, the main function on your __main__.py will be executed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zrb_extras-1.0.0a1.tar.gz.
File metadata
- Download URL: zrb_extras-1.0.0a1.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.13.9 Linux/5.15.153.1-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f38bf85dc13d7f325cc457c613a6a851e58dbe761baade2e0212aaa021f31c70
|
|
| MD5 |
5f08159a671b59b9c4796076e96a2b27
|
|
| BLAKE2b-256 |
7ad279763d1fc7041f009fbbb3faf305fe9bdf80864c5e6ed1c8d41bf16c849b
|
File details
Details for the file zrb_extras-1.0.0a1-py3-none-any.whl.
File metadata
- Download URL: zrb_extras-1.0.0a1-py3-none-any.whl
- Upload date:
- Size: 29.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.13.9 Linux/5.15.153.1-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86387bdab12b442c6438979ea776012dfd53a36aea9fc8d74d31ee3bc1a9407c
|
|
| MD5 |
3ffa8b423a648add6d573ad69391f0b9
|
|
| BLAKE2b-256 |
f411e1e2b47413fe33f83727df93b5c06b308abbceb2b3d2ed4612f1887df3c4
|