A Python package for interacting with AI models

These details have not been verified by PyPI

Project description

Allama - LLM Testing and Benchmarking Suite 🧪

A comprehensive testing and benchmarking suite for Large Language Models (LLMs) focused on Python code generation. The project enables automatic quality assessment of generated code through various metrics and generates detailed HTML reports.

✨ Features

Automated Testing of multiple LLM models with configurable prompts
Code Quality Assessment - syntax checking, execution, style, and functionality
Detailed HTML Reports with metrics, charts, and comparisons
Results Export to CSV and JSON for further analysis
Highly Configurable - easily add new models and tests
Multiple API Support - Ollama, local servers, cloud services
Model Ranking based on performance and quality metrics

🚀 Quick Start

1. Installation

Using Poetry (recommended)

# Clone the repository
git clone https://github.com/wronai/allama.git
cd allama

# Install dependencies
pip install poetry
poetry install

# Activate the virtual environment
poetry shell

Using pip

pip install .

2. Model Configuration

Create or edit the models.csv file to configure your models:

model_name,url,auth_header,auth_value,think,description
mistral:latest,http://localhost:11434/api/chat,,,false,Mistral Latest on Ollama
llama3:8b,http://localhost:11434/api/chat,,,false,Llama 3 8B
gpt-4,https://api.openai.com/v1/chat/completions,Authorization,Bearer sk-...,false,OpenAI GPT-4

CSV Columns:

model_name - Name of the model (e.g., mistral:latest, gpt-4)
url - API endpoint URL
auth_header - Authorization header (if required, e.g., "Authorization")
auth_value - Authorization value (e.g., "Bearer your-api-key")
think - Whether the model supports "think" parameter (true/false)
description - Description of the model

3. Running Tests

Basic Usage

# Run all tests with default configuration
python -m allama.runner

# Run benchmark suite
python -m allama.runner --benchmark

# Test a single model
python -m allama.runner --single-model "mistral:latest"

# Compare specific models
python -m allama.runner --compare "mistral:latest" "llama3:8b"

# Generate HTML report
python -m allama.runner --output benchmark_report.html

🛠️ Usage Examples

Using Makefile (recommended)

# Run tests
make test

# Run benchmark
make benchmark

# Test a single model
make single-model

# Generate HTML report
make report

Advanced Usage

# Run with custom configuration
python -m allama.runner --config custom_config.json

# Test with a specific prompt
python -m allama.runner --single-model "mistral:latest" --prompt-index 0

# Set request timeout (in seconds)
python -m allama.runner --timeout 60

📊 Evaluation Metrics

The system evaluates generated code based on the following criteria:

Basic Metrics (automatic)

✅ Correct Syntax - whether the code compiles without errors
✅ Executability - whether the code runs without runtime errors
✅ Keyword Matching - whether the code contains expected elements from the prompt

Code Quality Metrics

📝 Function/Class Definitions - proper code structure
🛡️ Error Handling - try/except blocks, input validation
📚 Documentation - docstrings, comments
📦 Imports - proper library usage
📏 Code Length - reasonable number of lines

Scoring System

Correct Syntax: 3 points
Runs without errors: 2 points
Contains expected elements: 2 points
Has function/class definitions: 1 point
Has error handling: 1 point
Has documentation: 1 point
Maximum: 10 points

🔧 Configuration

Customizing Prompts

Edit the allama/config.py file to modify test prompts:

TEST_PROMPTS = [
    {
        "name": "Custom Function",
        "prompt": "Write a Python function that...",
        "expected_keywords": ["def", "function_name"],
        "expected_behavior": "function_definition"
    }
]

JSON Configuration

Create a custom_config.json file for advanced configuration:

{
    "test_prompts": [
        {
            "name": "Custom Test",
            "prompt": "Your custom prompt here..."
        }
    ],
    "timeouts": {
        "request_timeout": 30,
        "execution_timeout": 5
    }
}

🔌 API Integration Examples

Ollama (local)

llama3:8b,http://localhost:11434/api/chat,,,false,Llama 3 8B

OpenAI API

gpt-4,https://api.openai.com/v1/chat/completions,Authorization,Bearer sk-your-key,false,OpenAI GPT-4

Anthropic Claude

claude-3,https://api.anthropic.com/v1/messages,x-api-key,your-key,false,Claude 3

Local Server

local-model,http://localhost:8080/generate,,,false,Local Model

📁 Project Structure

allama/
├── allama/               # Main package
│   ├── __init__.py      # Package initialization
│   ├── config.py        # Default configuration and prompts
│   ├── main.py          # Main module
│   └── runner.py        # Test runner implementation
├── tests/               # Test files
│   └── test_allama.py   # Unit tests
├── models.csv           # Model configurations
├── pyproject.toml       # Project metadata and dependencies
├── Makefile             # Common tasks
└── README.md            # This file

📈 Example Output

After running the benchmark, you'll get:

Console Output: Summary of test results
HTML Report: Detailed report with code examples and metrics
CSV/JSON: Raw data for further analysis

🚀 Getting Help

If you encounter any issues or have questions:

Check the issues page
Create a new issue with detailed information about your problem

🤝 Contributing

Contributions are welcome! Please read our Contributing Guidelines for details on how to contribute to this project.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Thanks to all the open-source projects that made this possible
Special thanks to the Ollama team for their amazing work

Made with ❤️ by the Allama team

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.18

Jun 11, 2025

0.1.17

Jun 11, 2025

0.1.16

Jun 11, 2025

0.1.15

Jun 11, 2025

0.1.14

Jun 11, 2025

0.1.13

Jun 11, 2025

0.1.12

Jun 11, 2025

0.1.11

Jun 11, 2025

This version

0.1.9

Jun 4, 2025

0.1.8

Jun 4, 2025

0.1.7

Jun 4, 2025

0.1.6

Jun 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

allama-0.1.9.tar.gz (16.8 kB view details)

Uploaded Jun 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

allama-0.1.9-py3-none-any.whl (16.1 kB view details)

Uploaded Jun 4, 2025 Python 3

File details

Details for the file allama-0.1.9.tar.gz.

File metadata

Download URL: allama-0.1.9.tar.gz
Upload date: Jun 4, 2025
Size: 16.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for allama-0.1.9.tar.gz
Algorithm	Hash digest
SHA256	`53bb78c050574aac9550ab119f98579280390add3285d190be2aa06268029dd6`
MD5	`49107193457315146174a4b3fcd49ea5`
BLAKE2b-256	`56445db4b4b63e8e5d722f7d733096300a6dfb9221926ce1dd76544f398759ff`

See more details on using hashes here.

File details

Details for the file allama-0.1.9-py3-none-any.whl.

File metadata

Download URL: allama-0.1.9-py3-none-any.whl
Upload date: Jun 4, 2025
Size: 16.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for allama-0.1.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`68c0e3110845c621b04b279d47d6dc46c91a412d88e5a3d1c801d3bd65b068f0`
MD5	`50e88a53c952603d43ae97f3f44d23ef`
BLAKE2b-256	`e75cb421fa60e617396dbcc49c2b0c6b18b7a76e530ca4df792362f232c2682e`

See more details on using hashes here.

allama 0.1.9

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Allama - LLM Testing and Benchmarking Suite 🧪

✨ Features

🚀 Quick Start

1. Installation

Using Poetry (recommended)

Using pip

2. Model Configuration

3. Running Tests

Basic Usage

🛠️ Usage Examples

Using Makefile (recommended)

Advanced Usage

📊 Evaluation Metrics

Basic Metrics (automatic)

Code Quality Metrics

Scoring System

🔧 Configuration

Customizing Prompts

JSON Configuration

🔌 API Integration Examples

Ollama (local)

OpenAI API

Anthropic Claude

Local Server

📁 Project Structure

📈 Example Output

🚀 Getting Help

🤝 Contributing

📄 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes