Skip to main content

Comprehensive benchmark and evaluation framework for educational AI question generation

Project description

InceptBench

PyPI version Python Version License: Proprietary

Educational content evaluation framework with multiple AI-powered assessment modules.

๐Ÿ“– Documentation

Official Sites

Website โ€ข Benchmarks โ€ข Glossary โ€ข Docs โ€ข API Endpoint โ€ข API Docs

User Guides

Developer Guides

Resources

๐Ÿš€ Quick Start

# Install from PyPI (latest published release)
pip install inceptbench

# Or install from source (current repo snapshot)
git clone https://github.com/incept-ai/inceptbench.git
cd inceptbench
python3 -m venv venv && source venv/bin/activate
pip install -e .

# Create .env file (optional - for API-based evaluation)
echo "OPENAI_API_KEY=your_key" >> .env
echo "ANTHROPIC_API_KEY=your_key" >> .env

# Generate example
inceptbench example

# Run evaluation via CLI
inceptbench evaluate qs.json --full

# Or call the CLI module directly (no install needed)
PYTHONPATH="$(pwd)/src:$PYTHONPATH" python -m inceptbench.cli evaluate qs.json --full

โœจ Features

  • 6 Specialized Evaluators - Quality assessment across multiple dimensions
  • Automatic Image Evaluation - Context-aware DI rubric scoring
  • Parallel Processing - 47+ tasks running concurrently
  • Multi-language Support - Evaluate content in any language
  • Dual Content Types - Questions (MCQ/fill-in) and text content (passages/explanations)
  • Production-Ready - Full demo in qs.json (~3-4 minutes)

๐Ÿ“Š Evaluators

Evaluator Type Auto
ti_question_qa Question quality (10 dimensions) Yes
answer_verification Answer correctness Yes
reading_question_qc MCQ distractor analysis Yes
math_content_evaluator Content quality (9 criteria) Yes
text_content_evaluator Pedagogical text assessment Yes
image_quality_di_evaluator DI rubric image quality Auto
external_edubench Educational benchmark (6 tasks) No

See EVALUATORS.md for details.

๐Ÿ“ฆ Architecture

inceptbench/
โ”œโ”€โ”€ src/inceptbench/          # Unified package (src/ layout)
โ”‚   โ”œโ”€โ”€ orchestrator.py        # Main evaluation orchestrator
โ”‚   โ”œโ”€โ”€ cli.py                 # Command-line interface
โ”‚   โ”œโ”€โ”€ core/                  # Core evaluators and utilities
โ”‚   โ”œโ”€โ”€ agents/                # Agent-based evaluators
โ”‚   โ”œโ”€โ”€ qc/                    # Quality control modules
โ”‚   โ”œโ”€โ”€ evaluation/            # Evaluation templates
โ”‚   โ””โ”€โ”€ image/                 # Image quality evaluation
โ”œโ”€โ”€ submodules/                # External dependencies
โ”‚   โ”œโ”€โ”€ reading-question-qc/
โ”‚   โ”œโ”€โ”€ EduBench/
โ”‚   โ”œโ”€โ”€ agentic-incept-reasoning/
โ”‚   โ””โ”€โ”€ image_generation_package/
โ””โ”€โ”€ pyproject.toml             # Package configuration

๐ŸŽฏ Demo

The qs.json file demonstrates all capabilities:

  • 8 questions (MCQ/fill-in, Arabic/English)
  • 4 text content items
  • 7 images (auto-evaluated)
  • All 6 evaluators active
  • ~3-4 minute runtime
# Using CLI (recommended)
inceptbench evaluate qs.json --full

# Or using Python API
python -c "from inceptbench import universal_unified_benchmark, UniversalEvaluationRequest; import json; data = json.load(open('qs.json')); request = UniversalEvaluationRequest(**data); result = universal_unified_benchmark(request); print(result.model_dump_json(indent=2))"

๐Ÿ“ Example Usage

CLI

inceptbench evaluate qs.json --full
inceptbench evaluate qs.json -o results.json

Python API

from inceptbench import universal_unified_benchmark, UniversalEvaluationRequest

request = UniversalEvaluationRequest(
    submodules_to_run=["ti_question_qa", "answer_verification"],
    generated_questions=[{
        "id": "q1",
        "type": "mcq",
        "question": "What is 2+2?",
        "answer": "4",
        "answer_options": {"A": "3", "B": "4", "C": "5"},
        "answer_explanation": "2+2 equals 4",
        "skill": {
            "title": "Basic Addition",
            "grade": "1",
            "subject": "mathematics",
            "difficulty": "easy"
        }
    }]
)

response = universal_unified_benchmark(request)
print(response.evaluations["q1"].score)

See USAGE.md for complete examples.

๐Ÿ–ผ๏ธ Image Evaluation

Add image_url to any question or content:

{
  "id": "q1",
  "question": "How many apples?",
  "image_url": "https://example.com/apples.png"
}

The image_quality_di_evaluator runs automatically with:

  • Context-aware evaluation (accompaniment vs standalone)
  • DI rubric scoring (0-100, normalized to 0-1)
  • Hard-fail gates (answer leakage, wrong representations)
  • Canonical DI representation checks

๐Ÿ“ฅ Input Format

Questions:

{
  "submodules_to_run": ["ti_question_qa"],
  "generated_questions": [{
    "id": "q1",
    "type": "mcq",
    "question": "...",
    "answer": "...",
    "image_url": "..."  // Optional
  }]
}

Text Content:

{
  "submodules_to_run": ["text_content_evaluator"],
  "generated_content": [{
    "id": "text1",
    "type": "text",
    "content": "...",
    "image_url": "..."  // Optional
  }]
}

See INPUT_OUTPUT.md for complete schema.

๐Ÿ“ค Output Format

Simplified (default):

{
  "evaluations": {
    "q1": {"score": 0.89}
  }
}

Full (verbose=True):

{
  "evaluations": {
    "q1": {
      "ti_question_qa": {
        "overall": 0.95,
        "scores": {...},
        "issues": [...],
        "strengths": [...]
      },
      "score": 0.89
    }
  }
}

๐Ÿ”„ Module Selection

Automatic (if submodules_to_run not specified):

  • Questions โ†’ ti_question_qa, answer_verification, math_content_evaluator, reading_question_qc
  • Text โ†’ text_content_evaluator, math_content_evaluator
  • Images โ†’ image_quality_di_evaluator (auto-added)

Manual:

request = UniversalEvaluationRequest(
    submodules_to_run=["ti_question_qa", "answer_verification"],  # Only these
    generated_questions=[...]
)

๐Ÿ“œ License

Proprietary - Copyright Trilogy Education Services

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inceptbench-1.5.0.tar.gz (177.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inceptbench-1.5.0-py3-none-any.whl (197.1 kB view details)

Uploaded Python 3

File details

Details for the file inceptbench-1.5.0.tar.gz.

File metadata

  • Download URL: inceptbench-1.5.0.tar.gz
  • Upload date:
  • Size: 177.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.13.7 Darwin/24.3.0

File hashes

Hashes for inceptbench-1.5.0.tar.gz
Algorithm Hash digest
SHA256 5a67038d918794ce9fe7648112ff47637326c469c0d83f64bd0e111289b44799
MD5 c5199b3a6872ce84f33f108e6a1b7202
BLAKE2b-256 cf96bd47ee9476d9e178d71e5454378f24b9844eb36f53cba1e798df6d6a871c

See more details on using hashes here.

File details

Details for the file inceptbench-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: inceptbench-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 197.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.13.7 Darwin/24.3.0

File hashes

Hashes for inceptbench-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6529d9c5cb35e50a25e67f5b123b8c6cbe24dfaad671736ede5e4526d9f69a84
MD5 9698a02e2f4e1f380ffcfc4c0dc14a80
BLAKE2b-256 1be5e5fcc98caa63c020615b55ba78b0425b3ff8133017038d655a3195d4afc8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page