Agentic student simulation using misconception matching - realistic wrong answers without LLM cheating
Project description
MiroFish Simulator
Agentic student simulation using misconception matching - produces realistic wrong answers without LLM "cheating".
The Problem
Traditional LLM-based student simulation doesn't work: LLMs know everything. When you ask GPT-4 to "act like a 5th grader who doesn't know about electoral votes", it still picks 270 because it can't actually "not know" things.
The Solution: Misconception Matching
Instead of trying to suppress LLM knowledge (impossible), we use a multi-agent pipeline that matches student misconceptions to wrong answers:
┌─────────────────────────────────────────────────────────────────────┐
│ AgenticOrchestrator │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ DISTRACTOR │ │ STUDENT │ │ SELECTOR │ │
│ │ AGENT │──▶│ MODEL │──▶│ AGENT │ │
│ │ │ │ AGENT │ │ │ │
│ │ "What error │ │ "What does │ │ "Match │ │
│ │ leads to │ │ this student│ │ misconception│ │
│ │ each wrong │ │ believe?" │ │ to answer" │ │
│ │ answer?" │ │ │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
How It Works
-
DistractorAgent - Analyzes each wrong answer to identify what misconception leads to it
- "Option A (218) catches students who confuse electoral votes with House majority"
- "Option D (435) catches students who confuse with total Congress members"
-
StudentModelAgent - Models what a specific student believes and misconceives
- Grade 5 class_clown: Low familiarity, vague beliefs, common misconceptions
- Grade 11 honors: High familiarity, specific knowledge, few misconceptions
-
SelectorAgent - Matches the student's misconceptions to the appropriate answer
- If student has misconception matching a distractor → pick that distractor
- If student has correct belief with high familiarity → pick correct answer
Quick Start
import asyncio
from mirofish_simulator import AgenticOrchestrator
orchestrator = AgenticOrchestrator()
question = {
"text": "How many electoral votes are needed to win the presidency?",
"options": ["218", "270", "300", "435"],
}
async def main():
# Single student
result = await orchestrator.simulate(
question=question,
correct_answer="B",
grade=5,
archetype="class_clown",
)
print(f"Selected: {result.selected}") # "C" (wrong!)
print(f"Correct: {result.is_correct}") # False
print(f"Familiarity: {result.student_model.topic_familiarity:.0%}") # 40%
print(f"Misconception: {result.selection_result.misconception_matched}")
asyncio.run(main())
Batch Simulation
# Efficient - distractor analysis done once, reused for all students
results = await orchestrator.simulate_batch(
question=question,
correct_answer="B",
students=[
{"grade": 5, "archetype": "class_clown"},
{"grade": 8, "archetype": "average_student"},
{"grade": 11, "archetype": "honors_overachiever"},
]
)
for r in results:
status = "✓" if r.is_correct else "✗"
print(f"Grade {r.grade} {r.archetype}: {r.selected}) {status}")
Realistic Results
The system produces realistic differentiation:
| Student | Electoral Votes (hard) | Branches of Gov (easy) |
|---|---|---|
| Grade 5 class_clown | ✗ | ✓ |
| Grade 8 average | ✗ | ✓ |
| Grade 11 honors | ✓ | ✓ |
- Easy questions: All students get them right (basic civics)
- Hard factual questions: Only students with specific knowledge get them right
- Archetypes matter: class_clown (low familiarity) gets more wrong than honors
Agent Details
DistractorAgent
Maps each wrong answer to the misconception that leads to it.
from mirofish_simulator import DistractorAgent
agent = DistractorAgent()
analysis = await agent.analyze(question, correct_answer="B")
for mapping in analysis.mappings:
if not mapping.is_correct:
print(f"{mapping.option}) {mapping.option_text}")
print(f" Misconception: {mapping.leads_from_misconception}")
print(f" Grade appeal: {mapping.grade_level_appeal}")
StudentModelAgent
Models what a student believes (correct and incorrect).
from mirofish_simulator import StudentModelAgent
agent = StudentModelAgent()
student = await agent.model_student(question, grade=5, archetype="class_clown")
print(f"Beliefs: {student.beliefs}")
print(f"Misconceptions: {student.misconceptions}")
print(f"Topic familiarity: {student.topic_familiarity:.0%}")
print(f"Guesses when unsure: {student.guesses_when_unsure}")
SelectorAgent
Matches student misconceptions to answers.
from mirofish_simulator import SelectorAgent
agent = SelectorAgent()
selection = await agent.select(question, distractor_analysis, student_model)
print(f"Selected: {selection.selected}")
print(f"Reason: {selection.selection_reason}")
print(f"Misconception matched: {selection.misconception_matched}")
Archetypes
| Archetype | Familiarity | Behavior |
|---|---|---|
honors_overachiever |
High (80%+) | Specific knowledge, confident |
average_student |
Medium (60-70%) | Taught content, some gaps |
class_clown |
Low (40%) | Minimal attention, guesses |
esl_student |
Medium | Core concepts solid, vocabulary issues |
disengaged_but_smart |
Variable | Has ability, inconsistent |
quiet_thinker |
Medium | Second-guesses self |
debate_club_kid |
High in interests | Good at arguments |
Installation
pip install mirofish-simulator
Or from source:
cd packages/mirofish-simulator
pip install -e .
Environment
export OPENAI_API_KEY="sk-..."
API Reference
AgenticOrchestrator
orchestrator = AgenticOrchestrator(
api_key: str = None, # Uses OPENAI_API_KEY env var if not provided
base_url: str = None, # Custom API base URL
model: str = "gpt-4o-mini",
)
# Single simulation
result = await orchestrator.simulate(
question: dict, # {"text": "...", "options": [...]}
correct_answer: str, # "A", "B", "C", or "D"
grade: int, # 1-12
archetype: str, # See archetypes above
)
# Batch simulation (efficient)
results = await orchestrator.simulate_batch(
question: dict,
correct_answer: str,
students: list, # [{"grade": 5, "archetype": "..."}, ...]
)
AgenticSimulationResult
result.selected # "A", "B", "C", "D"
result.selected_text # The full answer text
result.is_correct # True/False
result.grade # Student grade
result.archetype # Student archetype
# Agent outputs
result.distractor_analysis # DistractorAnalysis
result.student_model # StudentModel
result.selection_result # SelectionResult
# Methods
result.to_dict() # Full dict representation
result.summary() # Human-readable summary
Batch Health Analysis (Cost-Efficient)
For batch-level swarm intelligence that analyzes patterns across questions:
from mirofish_simulator import BatchHealthAnalyzer
analyzer = BatchHealthAnalyzer()
# Analyze a batch of questions
report = await analyzer.analyze(
questions=[q1, q2, q3, ...],
curriculum_context={
"standards": ["CCSS.MATH.3.OA.A.1", ...],
"grade": "5",
}
)
# Which questions need expensive evaluation?
print(report.questions_needing_attention) # ["q3", "q7"]
# Actionable feedback for the generator
print(report.generator_feedback)
# ["50% of questions have longest answer correct (expect ~25%)",
# "Position bias detected: {'D': 4, 'B': 1}"]
# Routing hints for evaluators
print(report.get_routing_hints())
# {"q3": ["reading_question_qc"], "q7": ["ti_question_qa"]}
How It Works
Phase 1: FREE heuristics (no LLM calls)
- Longest answer correct rate (expect ~25%)
- Grammar cues (a/an article agreement)
- Position bias in correct answers
- Absolute terms in correct answers
- Similar stems (redundancy)
- Option length variance
Phase 2: ONE LLM call (optional, auto-triggered)
- Coverage gap analysis
- Concept redundancy detection
- Generator feedback synthesis
Cost Model
Traditional: N questions × 5 LLM calls = 5N calls
BatchHealth: Heuristics (free) + 1 batch call + selective deep dives
Savings: 30-70% reduction in API costs
Integration with Evaluators
# Route expensive evaluators only where needed
report = await analyzer.analyze(questions)
for question in questions:
if question["id"] in report.questions_needing_attention:
run_full_pipeline(question) # Expensive
else:
run_light_check(question) # Cheap
Accessibility Analysis (Static)
For deterministic content analysis without LLM:
from mirofish_simulator import AccessibilityAnalyzer
analyzer = AccessibilityAnalyzer()
result = await analyzer.analyze(content, target_grade=5)
print(f"Reading Level: Grade {result.reading_level.flesch_kincaid_grade}")
print(f"Vocabulary Issues: {len(result.vocabulary.issues)}")
Version History
v0.10.0 (Current)
- BatchHealthAnalyzer - Batch-level swarm intelligence
- FREE heuristics detect test-wiseness exploits
- ONE LLM call for batch-level analysis
- Generator feedback and evaluator routing hints
- 30-70% cost reduction vs per-question evaluation
v0.9.0
- AdversarialSwarm for ambiguity detection
- AgentMemory for calibration
v0.8.0
- Agentic misconception-matching architecture
- DistractorAgent, StudentModelAgent, SelectorAgent
- Factual vs conceptual question handling
- Realistic wrong answers without LLM cheating
v0.7.0
- Multi-agent with verification (deprecated approach)
v0.6.0
- Single agent simulation
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mirofish_simulator-0.12.0.tar.gz.
File metadata
- Download URL: mirofish_simulator-0.12.0.tar.gz
- Upload date:
- Size: 105.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
896f703713c509386e4092572b87d7fbc9a452ec5eb24c27367f7813655ccfce
|
|
| MD5 |
c079f7901a1e42e7701f67db8152903b
|
|
| BLAKE2b-256 |
6a9fcb50bad328d2f6804a6d95c887da0e01befb0e199c32c6bd676c768fa459
|
File details
Details for the file mirofish_simulator-0.12.0-py3-none-any.whl.
File metadata
- Download URL: mirofish_simulator-0.12.0-py3-none-any.whl
- Upload date:
- Size: 135.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60a3a37f634161b3e9a7963ebcafcd7a244bb17e51863cce809d49681bc77feb
|
|
| MD5 |
c08ed1f67cb6469c43d72e41e43bd739
|
|
| BLAKE2b-256 |
932d0d06fdf7831f806044172cf9f655bb70fd0d165a7c4e055e87828884ee38
|