CLI tool for evaluating educational questions via Incept API
Project description
Incept Eval
CLI tool for evaluating educational questions via the Incept API. Supports comprehensive evaluation including V3 scaffolding assessment, answer verification, and EduBench task evaluation.
Features
🎯 Comprehensive Evaluation
- V3 Evaluation - Scaffolding quality and DI compliance scoring
- Answer Verification - GPT-4o powered correctness checking
- EduBench Tasks - Educational benchmarks (QA, EC, IP, AG)
📊 Flexible Output
- Pretty mode for quick score viewing
- Full detailed results with all metrics
- Append mode for collecting multiple evaluations
- JSON output for easy integration
🚀 Easy to Use
- Simple CLI interface
- Works with local or production API
- Multiple API key configuration methods
- Batch processing support
Installation
pip install incept-eval
Quick Start
1. Install
pip install incept-eval
2. Configure API Key
incept-eval configure YOUR_API_KEY
3. Generate Sample File
incept-eval example -o test.json
4. Evaluate
incept-eval evaluate test.json --verbose
Usage
Commands
evaluate - Evaluate questions from JSON file
# Basic evaluation (pretty mode by default)
incept-eval evaluate questions.json
# Verbose output with progress messages
incept-eval evaluate questions.json --verbose
# Save results to file (overwrite)
incept-eval evaluate questions.json -o results.json
# Append results to file (creates if not exists)
incept-eval evaluate questions.json -a all_evaluations.json --verbose
# Use local API server
incept-eval evaluate questions.json --api-url http://localhost:8000
# Full results without pretty formatting
incept-eval evaluate questions.json --no-pretty
example - Generate sample input file
# Print to stdout
incept-eval example
# Save to file
incept-eval example -o sample.json
configure - Save API key
incept-eval configure YOUR_API_KEY
help - Show detailed help
incept-eval help
Input Format
The input JSON file must contain:
request: Question generation request metadata (grade, subject, instructions, etc.)questions: Array of 1-5 questions to evaluate
Example:
{
"request": {
"grade": 3,
"count": 2,
"subject": "mathematics",
"instructions": "Generate multiplication word problems that involve equal groups.",
"language": "arabic"
},
"questions": [
{
"type": "mcq",
"question": "إذا كان لديك 4 علب من القلم وكل علبة تحتوي على 7 أقلام، كم عدد الأقلام لديك إجمالاً؟",
"answer": "28",
"difficulty": "medium",
"explanation": "استخدام ضرب لحساب مجموع الأقلام في جميع العلب.",
"options": {
"A": "21",
"B": "32",
"C": "35",
"D": "28"
},
"answer_choice": "D",
"detailed_explanation": { ... },
"voiceover_script": { ... },
"skill": null,
"image_url": null,
"di_formats_used": [ ... ]
}
]
}
Use incept-eval example to see a complete example with all fields.
Authentication
Three ways to provide your API key:
1. Config file (recommended)
incept-eval configure YOUR_API_KEY
2. Environment variable
export INCEPT_API_KEY=YOUR_API_KEY
3. Command line
incept-eval evaluate questions.json --api-key YOUR_API_KEY
Output Format
Pretty Mode (default)
Shows only the scores:
{
"overall_scores": {
"total_questions": 1.0,
"v3_average": 0.9555555555555555,
"answer_correctness_rate": 1.0,
"total_edubench_tasks": 3.0
},
"v3_scores": [
{
"correctness": 1.0,
"grade_alignment": 1.0,
"difficulty_alignment": 1.0,
"language_quality": 0.9,
"pedagogical_value": 0.9,
"explanation_quality": 0.8,
"instruction_adherence": 1.0,
"format_compliance": 1.0,
"query_relevance": 1.0,
"di_compliance": 0.9,
"overall": 0.9555555555555555,
"recommendation": "accept"
}
],
"answer_verification": [
{
"is_correct": true,
"confidence": 10
}
]
}
Full Mode (--no-pretty)
Includes all evaluation details:
overall_scores: Aggregate metricsv3_scores: Per-question scaffolding scoresanswer_verification: Answer correctness checksedubench_results: Full task evaluation responsessummary: Evaluation metadata and timing
Command Reference
| Command | Description |
|---|---|
evaluate |
Evaluate questions from JSON file |
example |
Generate sample input file |
configure |
Save API key to config file |
help |
Show detailed help and usage examples |
Evaluate Options
| Option | Short | Description |
|---|---|---|
--output PATH |
-o |
Save results to file (overwrites) |
--append PATH |
-a |
Append results to file (creates if not exists) |
--api-key KEY |
-k |
API key (or use INCEPT_API_KEY env var) |
--api-url URL |
API endpoint (default: production) | |
--pretty |
Show only scores (default: true) | |
--no-pretty |
Show full results including EduBench details | |
--verbose |
-v |
Show progress messages |
Examples
Basic Evaluation
# Evaluate with default settings (pretty mode)
incept-eval evaluate questions.json --verbose
Collecting Multiple Evaluations
# Append multiple evaluations to one file
incept-eval evaluate test1.json -a all_results.json
incept-eval evaluate test2.json -a all_results.json
incept-eval evaluate test3.json -a all_results.json
# Result: all_results.json contains an array of all 3 evaluations
Batch Processing
# Evaluate all files and append to one results file
for file in questions/*.json; do
incept-eval evaluate "$file" -a batch_results.json --verbose
done
Local Development
# Test against local API server
incept-eval evaluate test.json --api-url http://localhost:8000 --verbose
Full Results
# Get complete evaluation with EduBench details
incept-eval evaluate questions.json --no-pretty -o full_results.json
Evaluation Modules
The API evaluates questions using three main modules:
V3 Evaluation
- Scaffolding quality assessment (detailed_explanation steps)
- Direct Instruction (DI) compliance checking
- Pedagogical structure validation
- Language quality scoring
- Grade and difficulty alignment
Answer Verification
- GPT-4o powered correctness checking
- Mathematical accuracy validation
- Confidence scoring (0-10)
EduBench Tasks
- QA: Question Answering - Can the model answer the question?
- EC: Error Correction - Can the model identify and correct errors?
- IP: Instructional Planning - Can the model provide step-by-step solutions?
All modules run by default. Future versions will support configurable module selection.
Requirements
- Python >= 3.11
- Incept API key
Support
- Issues: GitHub Issues
- Help: Run
incept-eval helpfor detailed documentation
License
MIT License - see LICENSE file for details.
Made by the Incept Team
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file incept_eval-1.0.2.tar.gz.
File metadata
- Download URL: incept_eval-1.0.2.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.13.7 Darwin/24.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c813abe420b3d5b55abc4bbeeaf28bc87399037f852d829ab8338e67c684c81
|
|
| MD5 |
4ec45028d2d7fcb8b9aff295dcefb9c1
|
|
| BLAKE2b-256 |
1ca94c6b327bbda310598f6543935ff685b920acaa44f7fcd1aa00fd1d36ccf2
|
File details
Details for the file incept_eval-1.0.2-py3-none-any.whl.
File metadata
- Download URL: incept_eval-1.0.2-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.13.7 Darwin/24.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4fe8e70d7e11653b92b80218dcce8218d17d4ee22514cf5f01def0925d09b166
|
|
| MD5 |
8b5326fd2d947a9d4d4f2e66cf9408d9
|
|
| BLAKE2b-256 |
2cab8f563fa53a66935150bfb54dce5114e7890bac9875275301d6e6d79cbdb3
|