Comprehensive benchmark and evaluation framework for educational AI question generation
Project description
InceptBench
Educational content evaluation framework with multiple AI-powered assessment modules.
๐ Documentation
Official Sites
Website โข Benchmarks โข Glossary โข Docs โข API Endpoint โข API Docs
User Guides
- USAGE.md - Installation, configuration, CLI & Python API
- INPUT_OUTPUT.md - Input schemas and output formats
- EVALUATORS.md - Complete evaluator reference
Developer Guides
- WIKI.md - Documentation hub and workflows
- MAINTAINERS.md - Submodule maintainer guide
- PUBLISHING.md - Package publishing workflow
- VERSION_LOCATIONS.md - Version file reference
Resources
- Google Drive - Test data and examples
- GitHub Repo - Source code
๐ Quick Start
# Install from PyPI (latest published release)
pip install inceptbench
# Or install from source (current repo snapshot)
git clone https://github.com/incept-ai/inceptbench.git
cd inceptbench
python3 -m venv venv && source venv/bin/activate
pip install -e .
# Create .env file (optional - for API-based evaluation)
echo "OPENAI_API_KEY=your_key" >> .env
echo "ANTHROPIC_API_KEY=your_key" >> .env
# Generate example
inceptbench example
# Run evaluation via CLI
inceptbench evaluate qs.json --full
# Or call the CLI module directly (no install needed)
PYTHONPATH="$(pwd)/src:$PYTHONPATH" python -m inceptbench.cli evaluate qs.json --full
โจ Features
- 6 Specialized Evaluators - Quality assessment across multiple dimensions
- Automatic Image Evaluation - Context-aware DI rubric scoring
- Parallel Processing - 47+ tasks running concurrently
- Multi-language Support - Evaluate content in any language
- Dual Content Types - Questions (MCQ/fill-in) and text content (passages/explanations)
- Production-Ready - Full demo in
qs.json(~3-4 minutes)
๐ Evaluators
| Evaluator | Type | Auto |
|---|---|---|
| ti_question_qa | Question quality (10 dimensions) | Yes |
| answer_verification | Answer correctness | Yes |
| reading_question_qc | MCQ distractor analysis | Yes |
| math_content_evaluator | Content quality (9 criteria) | Yes |
| text_content_evaluator | Pedagogical text assessment | Yes |
| image_quality_di_evaluator | DI rubric image quality | Auto |
| external_edubench | Educational benchmark (6 tasks) | No |
See EVALUATORS.md for details.
๐ฆ Architecture
inceptbench/
โโโ src/inceptbench/ # Unified package (src/ layout)
โ โโโ orchestrator.py # Main evaluation orchestrator
โ โโโ cli.py # Command-line interface
โ โโโ core/ # Core evaluators and utilities
โ โโโ agents/ # Agent-based evaluators
โ โโโ qc/ # Quality control modules
โ โโโ evaluation/ # Evaluation templates
โ โโโ image/ # Image quality evaluation
โโโ submodules/ # External dependencies
โ โโโ reading-question-qc/
โ โโโ EduBench/
โ โโโ agentic-incept-reasoning/
โ โโโ image_generation_package/
โโโ pyproject.toml # Package configuration
๐ฏ Demo
The qs.json file demonstrates all capabilities:
- 8 questions (MCQ/fill-in, Arabic/English)
- 4 text content items
- 7 images (auto-evaluated)
- All 6 evaluators active
- ~3-4 minute runtime
# Using CLI (recommended)
inceptbench evaluate qs.json --full
# Or using Python API
python -c "from inceptbench import universal_unified_benchmark, UniversalEvaluationRequest; import json; data = json.load(open('qs.json')); request = UniversalEvaluationRequest(**data); result = universal_unified_benchmark(request); print(result.model_dump_json(indent=2))"
๐ Example Usage
CLI
inceptbench evaluate qs.json --full
inceptbench evaluate qs.json -o results.json
Python API
from inceptbench import universal_unified_benchmark, UniversalEvaluationRequest
request = UniversalEvaluationRequest(
submodules_to_run=["ti_question_qa", "answer_verification"],
generated_questions=[{
"id": "q1",
"type": "mcq",
"question": "What is 2+2?",
"answer": "4",
"answer_options": {"A": "3", "B": "4", "C": "5"},
"answer_explanation": "2+2 equals 4",
"skill": {
"title": "Basic Addition",
"grade": "1",
"subject": "mathematics",
"difficulty": "easy"
}
}]
)
response = universal_unified_benchmark(request)
print(response.evaluations["q1"].score)
See USAGE.md for complete examples.
๐ผ๏ธ Image Evaluation
Add image_url to any question or content:
{
"id": "q1",
"question": "How many apples?",
"image_url": "https://example.com/apples.png"
}
The image_quality_di_evaluator runs automatically with:
- Context-aware evaluation (accompaniment vs standalone)
- DI rubric scoring (0-100, normalized to 0-1)
- Hard-fail gates (answer leakage, wrong representations)
- Canonical DI representation checks
๐ฅ Input Format
Questions:
{
"submodules_to_run": ["ti_question_qa"],
"generated_questions": [{
"id": "q1",
"type": "mcq",
"question": "...",
"answer": "...",
"image_url": "..." // Optional
}]
}
Text Content:
{
"submodules_to_run": ["text_content_evaluator"],
"generated_content": [{
"id": "text1",
"type": "text",
"content": "...",
"image_url": "..." // Optional
}]
}
See INPUT_OUTPUT.md for complete schema.
๐ค Output Format
Simplified (default):
{
"evaluations": {
"q1": {"score": 0.89}
}
}
Full (verbose=True):
{
"evaluations": {
"q1": {
"ti_question_qa": {
"overall": 0.95,
"scores": {...},
"issues": [...],
"strengths": [...]
},
"score": 0.89
}
}
}
๐ Module Selection
Automatic (if submodules_to_run not specified):
- Questions โ
ti_question_qa,answer_verification,math_content_evaluator,reading_question_qc - Text โ
text_content_evaluator,math_content_evaluator - Images โ
image_quality_di_evaluator(auto-added)
Manual:
request = UniversalEvaluationRequest(
submodules_to_run=["ti_question_qa", "answer_verification"], # Only these
generated_questions=[...]
)
๐ License
Proprietary - Copyright Trilogy Education Services
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inceptbench-1.5.0.tar.gz.
File metadata
- Download URL: inceptbench-1.5.0.tar.gz
- Upload date:
- Size: 177.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.13.7 Darwin/24.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a67038d918794ce9fe7648112ff47637326c469c0d83f64bd0e111289b44799
|
|
| MD5 |
c5199b3a6872ce84f33f108e6a1b7202
|
|
| BLAKE2b-256 |
cf96bd47ee9476d9e178d71e5454378f24b9844eb36f53cba1e798df6d6a871c
|
File details
Details for the file inceptbench-1.5.0-py3-none-any.whl.
File metadata
- Download URL: inceptbench-1.5.0-py3-none-any.whl
- Upload date:
- Size: 197.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.13.7 Darwin/24.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6529d9c5cb35e50a25e67f5b123b8c6cbe24dfaad671736ede5e4526d9f69a84
|
|
| MD5 |
9698a02e2f4e1f380ffcfc4c0dc14a80
|
|
| BLAKE2b-256 |
1be5e5fcc98caa63c020615b55ba78b0425b3ff8133017038d655a3195d4afc8
|