Analyzes and summarizes AI benchmark results from unstructured text descriptions.
Project description
ai-benchmark-analyzer
A Python package that analyzes and summarizes AI benchmark results from unstructured text descriptions. It extracts structured insights like top-performing models, key metrics, and comparative analysis using pattern-matching techniques.
📦 Installation
Install via pip:
pip install ai_benchmark_analyzer
🚀 Quick Start
Basic Usage (Default LLM7)
from ai_benchmark_analyzer import ai_benchmark_analyzer
user_input = """
Model X achieved 92.3% accuracy on SOTA dataset with 128B parameters.
Model Y scored 89.1% but used only 7B parameters.
"""
response = ai_benchmark_analyzer(user_input)
print(response)
Custom LLM Integration
Replace the default ChatLLM7 with your preferred LLM (e.g., OpenAI, Anthropic, Google):
OpenAI Example
from langchain_openai import ChatOpenAI
from ai_benchmark_analyzer import ai_benchmark_analyzer
llm = ChatOpenAI()
response = ai_benchmark_analyzer(user_input, llm=llm)
Anthropic Example
from langchain_anthropic import ChatAnthropic
from ai_benchmark_analyzer import ai_benchmark_analyzer
llm = ChatAnthropic()
response = ai_benchmark_analyzer(user_input, llm=llm)
Google Generative AI Example
from langchain_google_genai import ChatGoogleGenerativeAI
from ai_benchmark_analyzer import ai_benchmark_analyzer
llm = ChatGoogleGenerativeAI()
response = ai_benchmark_analyzer(user_input, llm=llm)
🔧 Parameters
| Parameter | Type | Description |
|---|---|---|
user_input |
str |
Raw text describing benchmark results (required). |
api_key |
Optional[str] |
LLM7 API key (auto-fetched from LLM7_API_KEY env var if not provided). |
llm |
Optional[BaseChatModel] |
Custom LLM instance (defaults to ChatLLM7). |
🔗 Default LLM: LLM7
The package uses ChatLLM7 by default. Free tier rate limits are sufficient for most use cases. For higher limits:
- Set
LLM7_API_KEYenvironment variable. - Pass the key directly:
ai_benchmark_analyzer(api_key="your_key").
Get a free API key at LLM7 Token.
📝 Output Format
The function returns a list of structured strings matching a predefined regex pattern, ensuring consistent and reliable output formatting. Example output:
[
"Model: ModelX, Metric: Accuracy, Value: 92.3%, Dataset: SOTA, Parameters: 128B",
"Model: ModelY, Metric: Accuracy, Value: 89.1%, Dataset: SOTA, Parameters: 7B"
]
🛠️ Customization
- Pattern Matching: The output adheres to a regex pattern (defined in
prompts.py). Modify this file to adjust expected output formats. - LLM Prompts: System/human prompts are configurable via
prompts.py.
📜 License
MIT
📢 Support & Issues
For bugs/feature requests, open an issue on GitHub.
👤 Author
Eugene Evstafev (@chigwell) 📧 hi@euegne.plus
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_benchmark_analyzer-2025.12.21193050.tar.gz.
File metadata
- Download URL: ai_benchmark_analyzer-2025.12.21193050.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e560cacc9dc7205cd1ee18ad1a3c28ed1f610b6baee6d175776ebd7a04bb7b82
|
|
| MD5 |
9723bd66922456f6fac2e5600a315ffd
|
|
| BLAKE2b-256 |
6fe9729e879974ef32c4ca427b3750dc845c2898b3515797285a20e5cae97680
|
File details
Details for the file ai_benchmark_analyzer-2025.12.21193050-py3-none-any.whl.
File metadata
- Download URL: ai_benchmark_analyzer-2025.12.21193050-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4547c58f1ee46fecc9939e502298ab7593a05d053098b33779b09053d77250a
|
|
| MD5 |
d478dd5545b7d917f6538ddc83d6560f
|
|
| BLAKE2b-256 |
e812036613e2b65a194d8f27bd098f2fa080f213403612b4604949c60b55b709
|