Collection of Zrb additional utilities

These details have not been verified by PyPI

Project links

Project description

Zrb extras

zrb-extras is a pypi package.

You can install zrb-extras by invoking the following command:

pip install zrb-extras

Let your `LLMTask` `speak` and `listen`

Prerequisites

Termux

First of all, make sure termux has permission to access microphone/speaker

pkg update && pkg upgrade -y
pkg install pulseaudio termux-api -y

Run the following script or add it to ~/.bashrc

# start PulseAudio daemon
pulseaudio --start --load="module-native-protocol-tcp auth-ip-acl=127.0.0.1 auth-anonymous=1" --exit-idle-time=-1

# load module now (if it errors, check you gave Termux:API mic permission and restart Termux)
pactl load-module module-sles-source
# confirm source exists
pactl list short sources

# Start proot-distro
proot-distro login ubuntu

Proot-distro (Ubuntu)

apt install libasound2-dev portaudio19-dev pulseaudio

Create `zrb_init.py`

import os
from zrb.builtin import llm_ask
from zrb import llm_config
from zrb_extras.llm.tool import create_listen_tool, create_speak_tool

# Valid modes: "google", "openai", "termux", "vosk"
VOICE_MODE = os.getenv("VOICE_MODE", "vosk")
if VOICE_MODE not in ("google", "openai", "termux", "vosk"):
    VOICE_MODE = "vosk"

llm_ask.add_tool(
    create_speak_tool(
        mode=VOICE_MODE,
        genai_tts_model="gemini-2.5-flash-preview-tts",  # Optional
        genai_voice_name="Sulafat",  # Optional
        openai_tts_model="tts-1",  # Optional
        openai_voice_name="alloy",  # Optional
        sample_rate_out=24000,  # Optional
    )
)
llm_ask.add_tool(
    create_listen_tool(
        mode=VOICE_MODE,
        genai_stt_model="gemini-2.5-flash",  # Optional
        openai_stt_model="whisper-1",  # Optional
        sample_rate=16000,  # Optional
        channels=1,  # Optional
        silence_threshold=0.01,  # Optional
        max_silence=4.0,  # Optional
        # Sound Classification (optional)
        use_sound_classifier=True,  # Enable sound classification
        classification_model=None,  # Use default small model
        classification_system_prompt="Classify if the transcript contains actual speech or just background noise/fillers",
        classification_retries=2,  # Retry classification on failure
        fail_safe=True,  # Default to handling as speech if classification fails
    )
)

Sound Classification Feature

The create_listen_tool now includes an optional sound classification feature that uses an LLM to analyze transcripts and determine if they contain actual speech or just background noise, fillers, or non-speech sounds.

Key Features:

VAD is always used for initial speech detection (already implemented in existing listen tools)
When use_sound_classifier=True, transcripts are classified by an LLM using zrb's small model configuration system
Fail-safe default: If the classifier fails, it assumes the sound should be handled as speech
Structured output: Uses structured output types similar to ../zrb/src/zrb/task/llm/history_processor.py pattern
Configurable: Supports custom models, prompts, retries, and rate limiting

Usage Examples:

# Basic usage with sound classification
listen_tool = create_listen_tool(
    mode="vosk",
    use_sound_classifier=True,
    tool_name="smart_listen"
)

# With custom classification settings
listen_tool = create_listen_tool(
    mode="google",
    use_sound_classifier=True,
    classification_model="custom-model",
    classification_model_settings={"temperature": 0.1},
    classification_system_prompt="Classify speech vs noise",
    classification_retries=3,
    fail_safe=False,  # Raise exception on classification failure
    rate_limitter=my_rate_limiter,
    tool_name="custom_classifier_listen"
)

# Backward compatibility - old code still works
listen_tool = create_listen_tool(
    mode="termux",
    # No use_sound_classifier parameter
    tool_name="basic_listen"
)

How It Works:

The underlying listen tool (Vosk, Google, OpenAI, or Termux) captures audio and transcribes it
VAD (Voice Activity Detection) filters out silent periods
If use_sound_classifier=True, the transcript is sent to an LLM classifier
The classifier returns a structured response indicating:
- is_speech: Boolean indicating if it's actual speech
- confidence: Confidence score (0.0 to 1.0)
- category: Optional category (e.g., "speech", "noise", "filler")
Based on the classification:
- If is_speech=True: Returns the transcript
- If is_speech=False: Returns empty string (ignores non-speech)

Benefits:

Reduces false positives: Filters out background noise, coughs, throat clearing, etc.
Improves accuracy: Only processes actual speech content
Configurable: Can be tuned for different environments and use cases
Backward compatible: Existing code continues to work without changes

Improving Voice Quality (Vosk Mode)

When using VOICE_MODE=vosk, speech recognition uses offline Vosk models and text-to-speech uses pyttsx3. Here's how to improve quality:

Vosk Speech Recognition Models

Recommended: For best accuracy, use the larger model. The default small model (~40MB) has limited accuracy.

Model	Size	Accuracy	Recommended
`vosk-model-en-us-0.22`	~1.8GB	Best	✅ Yes
`vosk-model-en-us-daanzu-20200905`	~1GB	Good	Good balance
`vosk-model-small-en-us-0.15`	~40MB	Limited	Default (not recommended)

Easiest way: Auto-download (recommended)

Vosk auto-downloads models to ~/.cache/vosk/ when you specify model_name. Just configure it in zrb_init.py:

listen = create_listen_tool(
    mode="vosk",
    vosk_model_name="vosk-model-en-us-0.22",  # Auto-downloads on first use
    # ... other options
)

Alternative: Manual download

If you prefer to pre-download (e.g., on a machine with better internet):

mkdir -p ~/.cache/vosk
cd ~/.cache/vosk
wget https://alphacephei.com/vosk/models/vosk-model-en-us-0.22.zip
unzip vosk-model-en-us-0.22.zip
rm vosk-model-en-us-0.22.zip

Alternative: Use vosk_model_path for custom locations:

listen_tool = create_listen_tool(
    mode="vosk",
    vosk_model_path="/custom/path/to/vosk-model-en-us-0.22",
)

pyttsx3 Text-to-Speech Quality

pyttsx3 uses your system's TTS engine. On Linux, it uses espeak/espeak-ng.

Install espeak-ng for better voices:

# Ubuntu/Debian
sudo apt install espeak-ng

# Fedora
sudo dnf install espeak-ng

List available voices:

from zrb_extras.llm.tool.pyttsx3.speak import list_available_voices
for voice in list_available_voices():
    print(f"{voice['id']}: {voice['name']}")

Configure voice via environment variables:

# Set a specific voice (espeak-ng variants)
export PYTTSX3_VOICE_NAME="english-us+m3"   # Male voice
# export PYTTSX3_VOICE_NAME="english-us+f3" # Female voice

# Adjust speed (words per minute, default 150)
export PYTTSX3_VOICE_RATE="150"

# Adjust volume (0.0 to 1.0, default 1.0)
export PYTTSX3_VOICE_VOLUME="0.9"

Or pass to create_speak_tool:

speak_tool = create_speak_tool(
    mode="vosk",
    voice_name="english-us+m3",  # Specific voice
    rate=150,                     # Words per minute
    volume=0.9,                   # Volume (0.0-1.0)
)

macOS Users

On macOS, pyttsx3 falls back to the native say command which has better quality. You can use any installed macOS voice:

# List available voices
say -v ?

# Set voice
export PYTTSX3_VOICE_NAME="Samantha"  # Female voice
# export PYTTSX3_VOICE_NAME="Daniel"  # Male voice



# For maintainers

## Publish to pypi

To publish zrb-extras, you need to have a `Pypi` account:

- Log in or register to [https://pypi.org/](https://pypi.org/)
- Create an API token

You can also create a `TestPypi` account:

- Log in or register to [https://test.pypi.org/](https://test.pypi.org/)
- Create an API token

Once you have your API token, you need to configure poetry:

poetry config pypi-token.pypi


To publish zrb-extras, you can do the following command:

```bash
poetry publish --build

Updating version

You can update zrb-extras version by modifying the following section in pyproject.toml:

[project]
version = "0.0.2"

Adding dependencies

To add zrb-extras dependencies, you can edit the following section in pyproject.toml:

[project]
dependencies = [
    "Jinja2==3.1.2",
    "jsons==1.6.3"
]

Adding script

To make zrb-extras executable, you can edit the following section in pyproject.toml:

[project-scripts]
zrb-extras-hello = "zrb_extras.__main__:hello"

Now, whenever you run zrb-extras-hello, the main function on your __main__.py will be executed.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.1

Apr 10, 2026

This version

2.0.0

Apr 10, 2026

1.0.3

Mar 8, 2026

1.0.2

Feb 10, 2026

1.0.1

Feb 9, 2026

1.0.0

Jan 24, 2026

1.0.0a1 pre-release

Jan 19, 2026

0.3.3

Jan 11, 2026

0.3.2

Jan 11, 2026

0.3.1

Dec 27, 2025

0.3.0

Dec 27, 2025

0.2.1

Nov 24, 2025

0.2.0

Nov 22, 2025

0.1.9

Nov 22, 2025

0.1.8

Oct 10, 2025

0.1.7

Oct 10, 2025

0.1.6

Oct 10, 2025

0.1.5

Oct 10, 2025

0.1.4

Oct 3, 2025

0.1.2

Oct 3, 2025

0.1.1

Oct 2, 2025

0.1.0

Sep 30, 2025

0.0.9

Sep 6, 2025

0.0.8

Sep 6, 2025

0.0.7

Sep 5, 2025

0.0.6

Aug 11, 2024

0.0.5

Jul 20, 2024

0.0.4

Apr 15, 2024

0.0.3

Apr 15, 2024

0.0.2

Apr 14, 2024

0.0.1

Aug 24, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zrb_extras-2.0.0.tar.gz (24.2 kB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

zrb_extras-2.0.0-py3-none-any.whl (33.9 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file zrb_extras-2.0.0.tar.gz.

File metadata

Download URL: zrb_extras-2.0.0.tar.gz
Upload date: Apr 10, 2026
Size: 24.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.3 CPython/3.14.0 Linux/5.15.153.1-microsoft-standard-WSL2

File hashes

Hashes for zrb_extras-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`fe918e25f94c8654c0981ee68879a5e3b6c45c03e43cca8c723c4fe622565231`
MD5	`3bdb5b7f0eab3f89662652f788b83897`
BLAKE2b-256	`14fc9199ef869f26c9d08192f5419fd6b8c5469d39dd76ea5c77647f86aad657`

See more details on using hashes here.

File details

Details for the file zrb_extras-2.0.0-py3-none-any.whl.

File metadata

Download URL: zrb_extras-2.0.0-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 33.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.3 CPython/3.14.0 Linux/5.15.153.1-microsoft-standard-WSL2

File hashes

Hashes for zrb_extras-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`83c9adff5a7c010162bd23a613afc26a46ec790f0855ed9faa4f6d1dde2f4614`
MD5	`66745ecee343fb9c15bf213af7c34371`
BLAKE2b-256	`87e84ce0d8fc2a09535aaeb138d11845d96af824d87245881d2a4efc1d3dd398`

See more details on using hashes here.

zrb-extras 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Zrb extras

Let your LLMTask speak and listen

Prerequisites

Termux

Proot-distro (Ubuntu)

Create zrb_init.py

Sound Classification Feature

Key Features:

Usage Examples:

How It Works:

Benefits:

Improving Voice Quality (Vosk Mode)

Vosk Speech Recognition Models

pyttsx3 Text-to-Speech Quality

macOS Users

Updating version

Adding dependencies

Adding script

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Let your `LLMTask` `speak` and `listen`

Create `zrb_init.py`