A Python package for interacting with AI models
Project description
Allama - LLM Testing and Benchmarking Suite ๐งช
A comprehensive testing and benchmarking suite for Large Language Models (LLMs) focused on Python code generation. The project enables automatic quality assessment of generated code through various metrics and generates detailed HTML reports.
โจ Features
- Automated Testing of multiple LLM models with configurable prompts
- Code Quality Assessment - syntax checking, execution, style, and functionality
- Detailed HTML Reports with metrics, charts, and comparisons
- Results Export to CSV and JSON for further analysis
- Highly Configurable - easily add new models and tests
- Multiple API Support - Ollama, local servers, cloud services
- Model Ranking based on performance and quality metrics
๐ Quick Start
1. Installation
Using Poetry (recommended)
# Clone the repository
git clone https://github.com/wronai/allama.git
cd allama
# Install dependencies
pip install poetry
poetry install
# Activate the virtual environment
poetry shell
Using pip
pip install .
2. Model Configuration
Create or edit the models.csv file to configure your models:
model_name,url,auth_header,auth_value,think,description
mistral:latest,http://localhost:11434/api/chat,,,false,Mistral Latest on Ollama
llama3:8b,http://localhost:11434/api/chat,,,false,Llama 3 8B
gpt-4,https://api.openai.com/v1/chat/completions,Authorization,Bearer sk-...,false,OpenAI GPT-4
CSV Columns:
model_name- Name of the model (e.g., mistral:latest, gpt-4)url- API endpoint URLauth_header- Authorization header (if required, e.g., "Authorization")auth_value- Authorization value (e.g., "Bearer your-api-key")think- Whether the model supports "think" parameter (true/false)description- Description of the model
3. Running Tests
Basic Usage
# Run all tests with default configuration
python -m allama.runner
# Run benchmark suite
python -m allama.runner --benchmark
# Test a single model
python -m allama.runner --single-model "mistral:latest"
# Compare specific models
python -m allama.runner --compare "mistral:latest" "llama3:8b"
# Generate HTML report
python -m allama.runner --output benchmark_report.html
๐ ๏ธ Usage Examples
Using Makefile (recommended)
# Run tests
make test
# Run benchmark
make benchmark
# Test a single model
make single-model
# Generate HTML report
make report
Advanced Usage
# Run with custom configuration
python -m allama.runner --config custom_config.json
# Test with a specific prompt
python -m allama.runner --single-model "mistral:latest" --prompt-index 0
# Set request timeout (in seconds)
python -m allama.runner --timeout 60
๐ Evaluation Metrics
The system evaluates generated code based on the following criteria:
Basic Metrics (automatic)
- โ Correct Syntax - whether the code compiles without errors
- โ Executability - whether the code runs without runtime errors
- โ Keyword Matching - whether the code contains expected elements from the prompt
Code Quality Metrics
- ๐ Function/Class Definitions - proper code structure
- ๐ก๏ธ Error Handling - try/except blocks, input validation
- ๐ Documentation - docstrings, comments
- ๐ฆ Imports - proper library usage
- ๐ Code Length - reasonable number of lines
Scoring System
- Correct Syntax: 3 points
- Runs without errors: 2 points
- Contains expected elements: 2 points
- Has function/class definitions: 1 point
- Has error handling: 1 point
- Has documentation: 1 point
- Maximum: 10 points
๐ง Configuration
Customizing Prompts
Edit the allama/config.py file to modify test prompts:
TEST_PROMPTS = [
{
"name": "Custom Function",
"prompt": "Write a Python function that...",
"expected_keywords": ["def", "function_name"],
"expected_behavior": "function_definition"
}
]
JSON Configuration
Create a custom_config.json file for advanced configuration:
{
"test_prompts": [
{
"name": "Custom Test",
"prompt": "Your custom prompt here..."
}
],
"timeouts": {
"request_timeout": 30,
"execution_timeout": 5
}
}
๐ API Integration Examples
Ollama (local)
llama3:8b,http://localhost:11434/api/chat,,,false,Llama 3 8B
OpenAI API
gpt-4,https://api.openai.com/v1/chat/completions,Authorization,Bearer sk-your-key,false,OpenAI GPT-4
Anthropic Claude
claude-3,https://api.anthropic.com/v1/messages,x-api-key,your-key,false,Claude 3
Local Server
local-model,http://localhost:8080/generate,,,false,Local Model
๐ Project Structure
allama/
โโโ allama/ # Main package
โ โโโ __init__.py # Package initialization
โ โโโ config.py # Default configuration and prompts
โ โโโ main.py # Main module
โ โโโ runner.py # Test runner implementation
โโโ tests/ # Test files
โ โโโ test_allama.py # Unit tests
โโโ models.csv # Model configurations
โโโ pyproject.toml # Project metadata and dependencies
โโโ Makefile # Common tasks
โโโ README.md # This file
๐ Example Output
After running the benchmark, you'll get:
- Console Output: Summary of test results
- HTML Report: Detailed report with code examples and metrics
- CSV/JSON: Raw data for further analysis
๐ Getting Help
If you encounter any issues or have questions:
- Check the issues page
- Create a new issue with detailed information about your problem
๐ค Contributing
Contributions are welcome! Please read our Contributing Guidelines for details on how to contribute to this project.
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Thanks to all the open-source projects that made this possible
- Special thanks to the Ollama team for their amazing work
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file allama-0.1.9.tar.gz.
File metadata
- Download URL: allama-0.1.9.tar.gz
- Upload date:
- Size: 16.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53bb78c050574aac9550ab119f98579280390add3285d190be2aa06268029dd6
|
|
| MD5 |
49107193457315146174a4b3fcd49ea5
|
|
| BLAKE2b-256 |
56445db4b4b63e8e5d722f7d733096300a6dfb9221926ce1dd76544f398759ff
|
File details
Details for the file allama-0.1.9-py3-none-any.whl.
File metadata
- Download URL: allama-0.1.9-py3-none-any.whl
- Upload date:
- Size: 16.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68c0e3110845c621b04b279d47d6dc46c91a412d88e5a3d1c801d3bd65b068f0
|
|
| MD5 |
50e88a53c952603d43ae97f3f44d23ef
|
|
| BLAKE2b-256 |
e75cb421fa60e617396dbcc49c2b0c6b18b7a76e530ca4df792362f232c2682e
|