Skip to main content

Analyzes and summarizes AI benchmark results from unstructured text descriptions.

Project description

ai-benchmark-analyzer

PyPI version License: MIT Downloads LinkedIn

A Python package that analyzes and summarizes AI benchmark results from unstructured text descriptions. It extracts structured insights like top-performing models, key metrics, and comparative analysis using pattern-matching techniques.


📦 Installation

Install via pip:

pip install ai_benchmark_analyzer

🚀 Quick Start

Basic Usage (Default LLM7)

from ai_benchmark_analyzer import ai_benchmark_analyzer

user_input = """
Model X achieved 92.3% accuracy on SOTA dataset with 128B parameters.
Model Y scored 89.1% but used only 7B parameters.
"""
response = ai_benchmark_analyzer(user_input)
print(response)

Custom LLM Integration

Replace the default ChatLLM7 with your preferred LLM (e.g., OpenAI, Anthropic, Google):

OpenAI Example

from langchain_openai import ChatOpenAI
from ai_benchmark_analyzer import ai_benchmark_analyzer

llm = ChatOpenAI()
response = ai_benchmark_analyzer(user_input, llm=llm)

Anthropic Example

from langchain_anthropic import ChatAnthropic
from ai_benchmark_analyzer import ai_benchmark_analyzer

llm = ChatAnthropic()
response = ai_benchmark_analyzer(user_input, llm=llm)

Google Generative AI Example

from langchain_google_genai import ChatGoogleGenerativeAI
from ai_benchmark_analyzer import ai_benchmark_analyzer

llm = ChatGoogleGenerativeAI()
response = ai_benchmark_analyzer(user_input, llm=llm)

🔧 Parameters

Parameter Type Description
user_input str Raw text describing benchmark results (required).
api_key Optional[str] LLM7 API key (auto-fetched from LLM7_API_KEY env var if not provided).
llm Optional[BaseChatModel] Custom LLM instance (defaults to ChatLLM7).

🔗 Default LLM: LLM7

The package uses ChatLLM7 by default. Free tier rate limits are sufficient for most use cases. For higher limits:

  • Set LLM7_API_KEY environment variable.
  • Pass the key directly: ai_benchmark_analyzer(api_key="your_key").

Get a free API key at LLM7 Token.


📝 Output Format

The function returns a list of structured strings matching a predefined regex pattern, ensuring consistent and reliable output formatting. Example output:

[
    "Model: ModelX, Metric: Accuracy, Value: 92.3%, Dataset: SOTA, Parameters: 128B",
    "Model: ModelY, Metric: Accuracy, Value: 89.1%, Dataset: SOTA, Parameters: 7B"
]

🛠️ Customization

  • Pattern Matching: The output adheres to a regex pattern (defined in prompts.py). Modify this file to adjust expected output formats.
  • LLM Prompts: System/human prompts are configurable via prompts.py.

📜 License

MIT


📢 Support & Issues

For bugs/feature requests, open an issue on GitHub.


👤 Author

Eugene Evstafev (@chigwell) 📧 hi@euegne.plus

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_benchmark_analyzer-2025.12.21193050.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file ai_benchmark_analyzer-2025.12.21193050.tar.gz.

File metadata

File hashes

Hashes for ai_benchmark_analyzer-2025.12.21193050.tar.gz
Algorithm Hash digest
SHA256 e560cacc9dc7205cd1ee18ad1a3c28ed1f610b6baee6d175776ebd7a04bb7b82
MD5 9723bd66922456f6fac2e5600a315ffd
BLAKE2b-256 6fe9729e879974ef32c4ca427b3750dc845c2898b3515797285a20e5cae97680

See more details on using hashes here.

File details

Details for the file ai_benchmark_analyzer-2025.12.21193050-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_benchmark_analyzer-2025.12.21193050-py3-none-any.whl
Algorithm Hash digest
SHA256 e4547c58f1ee46fecc9939e502298ab7593a05d053098b33779b09053d77250a
MD5 d478dd5545b7d917f6538ddc83d6560f
BLAKE2b-256 e812036613e2b65a194d8f27bd098f2fa080f213403612b4604949c60b55b709

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page