Comprehensive benchmark and evaluation framework for educational AI question generation
Project description
InceptBench
Educational content evaluation framework using LLM-based analysis.
Website • Benchmarks • API Endpoint • API Docs • GitHub
Overview
InceptBench evaluates educational content across multiple quality dimensions:
- Automated Classification - Determines content type (question, quiz, reading, etc.)
- Hierarchical Evaluation - Decomposes complex content and evaluates bottom-up
- Comprehensive Metrics - 8-11 metrics per content type with scores and reasoning
- Curriculum-Aware - Integrates curriculum standards via vector store search
Installation
pip install inceptbench
System Dependencies
InceptBench requires system-level Cairo libraries for inline SVG image analysis. Without these, SVG images in educational content will not be analyzed.
macOS:
brew install cairo pango gdk-pixbuf libffi
Ubuntu/Debian:
sudo apt-get install -y libcairo2 libpango-1.0-0 libpangocairo-1.0-0 libgdk-pixbuf2.0-0 libffi-dev
Windows: Follow the GTK3 installer for Windows or install via MSYS2:
pacman -S mingw-w64-x86_64-cairo mingw-w64-x86_64-pango mingw-w64-x86_64-gdk-pixbuf2
CLI Usage
# Create sample input file
inceptbench example
# Evaluate from JSON file
inceptbench evaluate content.json
# Evaluate raw content
inceptbench evaluate --raw "What is 2+2? A) 3 B) 4 C) 5 D) 6"
# Save results to file
inceptbench evaluate content.json -o results.json
# Check version
inceptbench --version
REST API
The production API is available at https://api.inceptbench.com
# Health check
curl https://api.inceptbench.com/health
# Interactive docs
# Visit: https://api.inceptbench.com/docs
# Evaluate content
curl -X POST https://api.inceptbench.com/evaluate \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{"generated_content": [{"content": "What is 2+2? A) 3 B) 4 C) 5 D) 6"}]}'
Programmatic Usage (Python)
import asyncio
from inceptbench_new import EvaluationService
async def main():
service = EvaluationService()
result = await service.evaluate(
content="What is 2+2? A) 3 B) 4 C) 5 D) 6",
curriculum="common_core"
)
print(f"Score: {result.overall.score:.2f}")
asyncio.run(main())
Input Format
{
"generated_content": [
{
"id": "q1",
"curriculum": "common_core",
"request": {
"grade": "7",
"subject": "mathematics",
"type": "mcq",
"difficulty": "medium",
"locale": "en-US",
"skills": {
"lesson_title": "Congruent and Similar Triangles",
"substandard_id": "CCSS.MATH.CONTENT.7.G.A.1+3"
},
"instruction": "A real-world problem involving congruent and similar triangles"
},
"content": "Triangle ABC is similar to triangle DEF. If AB = 6 cm and DE = 9 cm, what is the ratio of their corresponding sides?"
}
]
}
Content Item Fields
| Field | Required | Default | Description |
|---|---|---|---|
content |
Yes | - | Content to evaluate (string or JSON) |
id |
No | Auto-generated | Unique identifier |
curriculum |
No | common_core |
Curriculum for alignment |
request |
No | null |
Generation metadata (see below) |
Request Metadata Fields (all optional)
| Field | Description |
|---|---|
grade |
Grade level (e.g., "7", "K", "12") |
subject |
Subject area (e.g., "mathematics", "english") |
type |
Content type (e.g., "mcq", "fill-in", "article") |
difficulty |
Difficulty level (e.g., "easy", "medium", "hard") |
locale |
Locale/language code (e.g., "en-US", "es-MX") |
skills |
Skills info (JSON object or string) |
instruction |
Generation instruction/prompt |
Images
Images are automatically detected from content. Include as:
- Direct URLs:
https://example.com/image.png - Markdown:
 - HTML:
<img src="https://example.com/image.png"> - Inline SVG:
<svg>...</svg>(requires Cairo system libraries, see System Dependencies)
Content Types
The evaluator automatically classifies content into:
| Type | Description |
|---|---|
question |
Single educational question |
quiz |
Multiple questions together |
fiction_reading |
Fictional narrative passages |
nonfiction_reading |
Informational passages |
other |
General educational content |
Documentation
For complete documentation, input format details, and developer guides:
View Full Documentation on GitHub
License
Proprietary - Copyright Trilogy Education Services
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inceptbench-2.3.4.tar.gz.
File metadata
- Download URL: inceptbench-2.3.4.tar.gz
- Upload date:
- Size: 4.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.13.9 Darwin/25.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a4007bef3389be8f0d0188f11653b77c50fa5ad103fcc38531f20df1a07732f
|
|
| MD5 |
abe086ed687c2192a2895c97ed70e81c
|
|
| BLAKE2b-256 |
aae2663d62b5d1079154a504f227c06f11dccd2796683f4060483af9ad683948
|
File details
Details for the file inceptbench-2.3.4-py3-none-any.whl.
File metadata
- Download URL: inceptbench-2.3.4-py3-none-any.whl
- Upload date:
- Size: 4.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.13.9 Darwin/25.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc3972945acd43b2a2ba4466b3607468578a1028ccedf4972e5812d134bf6ca8
|
|
| MD5 |
418e144651c7706e598795a7ab1cbe5c
|
|
| BLAKE2b-256 |
adcda8e500d6bb1c8247663e0a840dd3a3285d088498b01bf1630b46e44e3acb
|