A simple package for audio comparison using large language models

These details have not been verified by PyPI

Project links

Project description

AudioJudge 🎵

A Python wrapper for audio comparison and evaluation using a Large Audio Model as Judge (i.e., LAM-as-a-Judge or AudioJudge) with support for in-context learning and flexible audio concatenation strategies.

Features

Multi-Model Support: Works with OpenAI GPT-4o Audio and Google Gemini models (GPT-4o-audio family, Gemini-1.5/2.0/2.5-flash families)
Flexible Audio Comparison: Support for both pairwise and pointwise audio evaluation
In-Context Learning: Provide examples to improve model performance
Audio Concatenation: Multiple strategies for combining audio files
Smart Caching: Built-in API response caching to reduce costs and latency

Installation

pip install audiojudge  # Requires Python >= 3.10

Quick Start

from audiojudge import AudioJudge

# Initialize with API keys
judge = AudioJudge(
    openai_api_key="your-openai-key",
    google_api_key="your-google-key"
)

# Simple pairwise comparison
result = judge.judge_audio(
    audio1_path="audio1.wav",
    audio2_path="audio2.wav",
    system_prompt="Compare these two audio clips for quality.",
    model="gpt-4o-audio-preview"
)

print(result["response"])

Quick Demo

AudioJudge with Speaker Identification Demo

Configuration

Environment Variables

Set your API keys as environment variables:

export OPENAI_API_KEY="your-openai-key"
export GOOGLE_API_KEY="your-google-key"
export EVAL_CACHE_DIR=".audio_cache"  # Optional
export EVAL_DISABLE_CACHE="false"     # Optional

AudioJudge Parameters

judge = AudioJudge(
    openai_api_key=None,           # OpenAI API key (optional if env var set)
    google_api_key=None,           # Google API key (optional if env var set)
    temp_dir="temp_audio",         # Temporary files directory for storing concatenated audios
    signal_folder="signal_audios", # TTS signal files directory used in audio concatenation
                                   # Default signal files are included in the package
                                   # Will use TTS model to generate new ones if needed
    cache_dir=None,                # API Cache directory (default: .eval_cache)
    cache_expire_seconds=2592000,  # Cache expiration (30 days)
    disable_cache=False            # Disable caching
)

Core Methods

1. Pairwise Audio Comparison

1.1. Pairwise Comparison without Instruction Audio

Compare two audio files and get a model response directly:

result = judge.judge_audio(
    audio1_path="speaker1.wav",
    audio2_path="speaker2.wav",
    system_prompt="Which speaker sounds more professional?",  # Define the evaluation criteria at the beginning
    user_prompt="Analyze both speakers and provide your assessment.",  # Optional specific instructions at the end
    model="gpt-4o-audio-preview",
    temperature=0.1,  # 0.0 is not supported for some api calling
    max_tokens=500    # Maximum response length
)

if result["success"]:
    print(f"Model response: {result['response']}")
else:
    print(f"Error: {result['error']}")

1.2. Pairwise Comparison with Instruction Audio

For scenarios where both audio clips are responses to the same instruction (e.g., comparing two speech-in speech-out systems):

result = judge.judge_audio(
    audio1_path="system_a_response.wav",  # Response from system A
    audio2_path="system_b_response.wav",  # Response from system B
    instruction_path="original_instruction.wav",  # The instruction both systems responded to
    system_prompt="Compare which response better follows the given instruction.",
    model="gpt-4o-audio-preview"
)

print(f"Better response: {result['response']}")

2. Pointwise Audio Evaluation

Evaluate a single audio file:

result = judge.judge_audio_pointwise(
    audio_path="speech.wav",
    system_prompt="Rate the speech quality from 1-10.",
    model="gpt-4o-audio-preview"
)

print(f"Quality rating: {result['response']}")

In-Context Learning

Improve model performance by providing examples:

Pairwise Examples

from audiojudge.utils import AudioExample

# Create examples
examples = [
    AudioExample(
        audio1_path="example1_good.wav",
        audio2_path="example1_bad.wav",
        output="Audio 1 is better quality with clearer speech."
        # Optional: instruction_path="instruction1.wav"  # For instruction-based evaluation
    ),
    AudioExample(
        audio1_path="example2_noisy.wav",
        audio2_path="example2_clean.wav",
        output="Audio 2 is better due to less background noise."
    )
]

# Use examples in evaluation
result = judge.judge_audio(
    audio1_path="test1.wav",
    audio2_path="test2.wav",
    system_prompt="Compare audio quality and choose the better one.",
    examples=examples,
    model="gpt-4o-audio-preview"
)

Pointwise Examples

from audiojudge.utils import AudioExamplePointwise

examples = [
    AudioExamplePointwise(
        audio_path="high_quality.wav",
        output="9/10 - Excellent clarity and no background noise"
    ),
    AudioExamplePointwise(
        audio_path="medium_quality.wav",
        output="6/10 - Acceptable quality with minor distortions"
    )
]

result = judge.judge_audio_pointwise(
    audio_path="test_audio.wav",
    system_prompt="Rate the audio quality from 1-10 with explanation.",
    examples=examples,
    model="gpt-4o-audio-preview"
)

Audio Concatenation Methods

Control how audio files are combined for model input:

Available Methods

For Pairwise Evaluation:

no_concatenation: Keep all audio files separate
pair_example_concatenation: Concatenate each example pair
examples_concatenation: Concatenate all examples into one file
test_concatenation: Concatenate test audio pair
examples_and_test_concatenation (default): Concatenate all examples and test audio - shown as the most effective prompting strategy

For Pointwise Evaluation:

no_concatenation (default): Keep all audio files separate
examples_concatenation: Concatenate all examples into one file

Example Usage

# Pairwise: Keep everything separate
result = judge.judge_audio(
    audio1_path="test1.wav",
    audio2_path="test2.wav",
    system_prompt="Compare these audio clips.",
    concatenation_method="no_concatenation"
)

# Pairwise: Concatenate all for better context (recommended)
result = judge.judge_audio(
    audio1_path="test1.wav",
    audio2_path="test2.wav",
    system_prompt="Compare these audio clips.",
    examples=examples,
    concatenation_method="examples_and_test_concatenation"
)

# Pointwise: With example concatenation
result = judge.judge_audio_pointwise(
    audio_path="test.wav",
    system_prompt="Rate the audio quality from 1-10.",
    examples=pointwise_examples,
    concatenation_method="examples_concatenation"
)

Instruction Audio

Use audio files as instructions for more complex tasks:

With Examples

# Examples with instruction audio
examples = [
    AudioExample(
        audio1_path="example1.wav",
        audio2_path="example2.wav",
        instruction_path="instruction_example.wav",
        output="Audio 1 follows the instruction better."
    )
]

result = judge.judge_audio(
    audio1_path="test1.wav",
    audio2_path="test2.wav",
    instruction_path="instruction.wav",
    system_prompt="Follow the audio instruction to evaluate these clips.",
    examples=examples,
    model="gpt-4o-audio-preview"
)

Supported Models

OpenAI Models

gpt-4o-audio-preview (recommended)
gpt-4o-mini-audio-preview

Google Models

gemini-1.5-flash
gemini-2.0-flash
gemini-2.5-flash

# Using different models
result_gpt = judge.judge_audio(
    audio1_path="test1.wav",
    audio2_path="test2.wav",
    system_prompt="Compare quality.",
    model="gpt-4o-audio-preview"
)

result_gemini = judge.judge_audio(
    audio1_path="test1.wav",
    audio2_path="test2.wav",
    system_prompt="Compare quality.",
    model="gemini-2.0-flash"
)

Caching

AudioJudge includes intelligent caching to reduce API costs and improve performance:

Cache Management

# Clear entire cache
judge.clear_cache()

# Clear only failed (None) responses
valid_entries = judge.clear_none_cache()
print(f"Kept {valid_entries} valid cache entries")

# Get cache statistics
stats = judge.get_cache_stats()
print(f"Cache entries: {stats['total_entries']}")

Cache Configuration

# Disable caching
judge = AudioJudge(disable_cache=True)

# Custom cache directory and expiration
judge = AudioJudge(
    cache_dir="my_audio_cache",
    cache_expire_seconds=86400  # 1 day
)

Advanced Usage

Error Handling

result = judge.judge_audio(
    audio1_path="test1.wav",
    audio2_path="test2.wav",
    system_prompt="Compare these audio clips."
)

if result["success"]:
    response = result["response"]
    model_used = result["model"]
    print(f"Success with {model_used}: {response}")
else:
    error_message = result["error"]
    print(f"Evaluation failed: {error_message}")

Temperature and Token Control

# Deterministic output
result = judge.judge_audio(
    audio1_path="test1.wav",
    audio2_path="test2.wav",
    system_prompt="Compare quality.",
    temperature=0.000001,
    max_tokens=100
)

# More creative output
result = judge.judge_audio(
    audio1_path="test1.wav",
    audio2_path="test2.wav",
    system_prompt="Describe these audio clips creatively.",
    temperature=0.8,
    max_tokens=500
)

Best Practices

1. System Prompt Design

# Good: Specific and clear
system_prompt = """
You are an audio quality expert. Compare two audio clips and determine which has:
1. Better speech clarity
2. Less background noise  
3. More natural sound

Respond with: "Audio 1" or "Audio 2" followed by your reasoning.
"""

# Avoid: Vague instructions
system_prompt = "Which audio is better?"

2. Example Selection

# Use diverse, representative examples
examples = [
    AudioExample(
        audio1_path="clear.wav", 
        audio2_path="muffled.wav", 
        output="Audio 1 - clearer speech"
    ),
    AudioExample(
        audio1_path="noisy.wav", 
        audio2_path="clean.wav", 
        output="Audio 2 - less background noise"
    ),
    AudioExample(
        audio1_path="fast.wav", 
        audio2_path="normal.wav", 
        output="Audio 2 - better pacing"
    )
]

3. Concatenation Strategy

Use no_concatenation for simple cases or when preserving individual audio quality is crucial
Use examples_and_test_concatenation when you have examples (recommended for best performance)
Consider model context limits when choosing strategies

4. Model Selection

GPT-4o Audio: Best for complex reasoning and detailed analysis
Gemini 2.0+: Good for general comparisons, potentially faster and more cost-effective

Research and Experiments

This package is based on research in audio evaluation using large audio models. The experimental code and evaluation scripts used in our research are available in the experiments/ folder for reproducing the result.

Example Usage

Additional usage examples can be found in the examples/ folder, which wraps some of our experiments into the package for demonstration:

examples/audiojudge_usage.py: Pairwise comparison without instruction
- Datasets: somos, thaimos, tmhintq, pronunciation, speed, speaker evaluations
examples/audiojudge_usage_with_instruction.py: Pairwise comparison with instruction audio
- Datasets: System-level comparisons including ChatbotArena and SpeakBench
examples/audiojudge_usage_pointwise.py: Pointwise evaluation
- Datasets: somos, thaimos, tmhintq,

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For issues and questions:

GitHub Issues: Create an issue

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Jul 15, 2025

0.1.1

Jul 15, 2025

0.1.0

Jul 15, 2025

0.0.0

Jul 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audiojudge-0.1.2.tar.gz (1.0 MB view details)

Uploaded Jul 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

audiojudge-0.1.2-py3-none-any.whl (2.0 MB view details)

Uploaded Jul 15, 2025 Python 3

File details

Details for the file audiojudge-0.1.2.tar.gz.

File metadata

Download URL: audiojudge-0.1.2.tar.gz
Upload date: Jul 15, 2025
Size: 1.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for audiojudge-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`2dc15506597dab2f6b6348ba1372ab6c7d684a9a2d455da6ad3128bfd46328d4`
MD5	`0065647a8b7f10907d041c095dac1451`
BLAKE2b-256	`8c5f95cb14868886fdf0d73ec6cce89154032a2a15d990e112a87a509f0494ef`

See more details on using hashes here.

File details

Details for the file audiojudge-0.1.2-py3-none-any.whl.

File metadata

Download URL: audiojudge-0.1.2-py3-none-any.whl
Upload date: Jul 15, 2025
Size: 2.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for audiojudge-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5cd3b93d63accb003caeef17b005b0935e84409050878a447ebe1f10d7ea50d1`
MD5	`a98402fa0ac328949b7c0cd58f721f28`
BLAKE2b-256	`b4f22121d81e572f8b43bba55d1f39e97a68e43f7619fb24b64bb5a0355171d1`

See more details on using hashes here.

audiojudge 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AudioJudge 🎵

Features

Installation

Quick Start

Quick Demo

Configuration

Environment Variables

AudioJudge Parameters

Core Methods

1. Pairwise Audio Comparison

1.1. Pairwise Comparison without Instruction Audio

1.2. Pairwise Comparison with Instruction Audio

2. Pointwise Audio Evaluation

In-Context Learning

Pairwise Examples

Pointwise Examples

Audio Concatenation Methods

Available Methods

Example Usage

Instruction Audio

With Examples

Supported Models

OpenAI Models

Google Models

Caching

Cache Management

Cache Configuration

Advanced Usage

Error Handling

Temperature and Token Control

Best Practices

1. System Prompt Design

2. Example Selection

3. Concatenation Strategy

4. Model Selection

Research and Experiments

Example Usage

License

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes