Skip to main content

A Python package for interacting with AI models

Project description

allama-logo.svg

Allama - LLM Testing and Benchmarking Suite ๐Ÿงช

A comprehensive testing and benchmarking suite for Large Language Models (LLMs) focused on Python code generation. The project enables automatic quality assessment of generated code through various metrics and generates detailed HTML reports.

โœจ Features

  • Automated Testing of multiple LLM models with configurable prompts
  • Code Quality Assessment - syntax checking, execution, style, and functionality
  • Detailed HTML Reports with metrics, charts, and comparisons
  • Results Export to CSV and JSON for further analysis
  • Highly Configurable - easily add new models and tests
  • Multiple API Support - Ollama, local servers, cloud services
  • Model Ranking based on performance and quality metrics

๐Ÿš€ Quick Start

1. Installation

Using Poetry (recommended)

# Clone the repository
git clone https://github.com/wronai/allama.git
cd allama

# Install dependencies
pip install poetry
poetry install

# Activate the virtual environment
poetry shell

Using pip

pip install .

2. Model Configuration

Create or edit the models.csv file to configure your models:

model_name,url,auth_header,auth_value,think,description
mistral:latest,http://localhost:11434/api/chat,,,false,Mistral Latest on Ollama
llama3:8b,http://localhost:11434/api/chat,,,false,Llama 3 8B
gpt-4,https://api.openai.com/v1/chat/completions,Authorization,Bearer sk-...,false,OpenAI GPT-4

CSV Columns:

  • model_name - Name of the model (e.g., mistral:latest, gpt-4)
  • url - API endpoint URL
  • auth_header - Authorization header (if required, e.g., "Authorization")
  • auth_value - Authorization value (e.g., "Bearer your-api-key")
  • think - Whether the model supports "think" parameter (true/false)
  • description - Description of the model

3. Running Tests

Basic Usage

# Run all tests with default configuration
python -m allama.runner

# Run benchmark suite
python -m allama.runner --benchmark

# Test a single model
python -m allama.runner --single-model "mistral:latest"

# Compare specific models
python -m allama.runner --compare "mistral:latest" "llama3:8b"

# Generate HTML report
python -m allama.runner --output benchmark_report.html

๐Ÿ› ๏ธ Usage Examples

Using Makefile (recommended)

# Run tests
make test

# Run benchmark
make benchmark

# Test a single model
make single-model

# Generate HTML report
make report

Advanced Usage

# Run with custom configuration
python -m allama.runner --config custom_config.json

# Test with a specific prompt
python -m allama.runner --single-model "mistral:latest" --prompt-index 0

# Set request timeout (in seconds)
python -m allama.runner --timeout 60

๐Ÿ“Š Evaluation Metrics

The system evaluates generated code based on the following criteria:

Basic Metrics (automatic)

  • โœ… Correct Syntax - whether the code compiles without errors
  • โœ… Executability - whether the code runs without runtime errors
  • โœ… Keyword Matching - whether the code contains expected elements from the prompt

Code Quality Metrics

  • ๐Ÿ“ Function/Class Definitions - proper code structure
  • ๐Ÿ›ก๏ธ Error Handling - try/except blocks, input validation
  • ๐Ÿ“š Documentation - docstrings, comments
  • ๐Ÿ“ฆ Imports - proper library usage
  • ๐Ÿ“ Code Length - reasonable number of lines

Scoring System

  • Correct Syntax: 3 points
  • Runs without errors: 2 points
  • Contains expected elements: 2 points
  • Has function/class definitions: 1 point
  • Has error handling: 1 point
  • Has documentation: 1 point
  • Maximum: 10 points

๐Ÿ”ง Configuration

Customizing Prompts

Edit the allama/config.py file to modify test prompts:

TEST_PROMPTS = [
    {
        "name": "Custom Function",
        "prompt": "Write a Python function that...",
        "expected_keywords": ["def", "function_name"],
        "expected_behavior": "function_definition"
    }
]

JSON Configuration

Create a custom_config.json file for advanced configuration:

{
    "test_prompts": [
        {
            "name": "Custom Test",
            "prompt": "Your custom prompt here..."
        }
    ],
    "timeouts": {
        "request_timeout": 30,
        "execution_timeout": 5
    }
}

๐Ÿ”Œ API Integration Examples

Ollama (local)

llama3:8b,http://localhost:11434/api/chat,,,false,Llama 3 8B

OpenAI API

gpt-4,https://api.openai.com/v1/chat/completions,Authorization,Bearer sk-your-key,false,OpenAI GPT-4

Anthropic Claude

claude-3,https://api.anthropic.com/v1/messages,x-api-key,your-key,false,Claude 3

Local Server

local-model,http://localhost:8080/generate,,,false,Local Model

๐Ÿ“ Project Structure

allama/
โ”œโ”€โ”€ allama/               # Main package
โ”‚   โ”œโ”€โ”€ __init__.py      # Package initialization
โ”‚   โ”œโ”€โ”€ config.py        # Default configuration and prompts
โ”‚   โ”œโ”€โ”€ main.py          # Main module
โ”‚   โ””โ”€โ”€ runner.py        # Test runner implementation
โ”œโ”€โ”€ tests/               # Test files
โ”‚   โ””โ”€โ”€ test_allama.py   # Unit tests
โ”œโ”€โ”€ models.csv           # Model configurations
โ”œโ”€โ”€ pyproject.toml       # Project metadata and dependencies
โ”œโ”€โ”€ Makefile             # Common tasks
โ””โ”€โ”€ README.md            # This file

๐Ÿ“ˆ Example Output

After running the benchmark, you'll get:

  1. Console Output: Summary of test results
  2. HTML Report: Detailed report with code examples and metrics
  3. CSV/JSON: Raw data for further analysis

๐Ÿš€ Getting Help

If you encounter any issues or have questions:

  1. Check the issues page
  2. Create a new issue with detailed information about your problem

๐Ÿค Contributing

Contributions are welcome! Please read our Contributing Guidelines for details on how to contribute to this project.

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Thanks to all the open-source projects that made this possible
  • Special thanks to the Ollama team for their amazing work

Made with โค๏ธ by the Allama team

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

allama-0.1.9.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

allama-0.1.9-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file allama-0.1.9.tar.gz.

File metadata

  • Download URL: allama-0.1.9.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for allama-0.1.9.tar.gz
Algorithm Hash digest
SHA256 53bb78c050574aac9550ab119f98579280390add3285d190be2aa06268029dd6
MD5 49107193457315146174a4b3fcd49ea5
BLAKE2b-256 56445db4b4b63e8e5d722f7d733096300a6dfb9221926ce1dd76544f398759ff

See more details on using hashes here.

File details

Details for the file allama-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: allama-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for allama-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 68c0e3110845c621b04b279d47d6dc46c91a412d88e5a3d1c801d3bd65b068f0
MD5 50e88a53c952603d43ae97f3f44d23ef
BLAKE2b-256 e75cb421fa60e617396dbcc49c2b0c6b18b7a76e530ca4df792362f232c2682e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page