A domain-based quiz generation system using LlamaIndex and LLMs
Project description
Quizard Generator
Generate multiple-choice quizzes from documents using local or cloud LLMs via CLI or Python API.
What It Does
Quizard Generator indexes your documents (PDFs, text files, etc.) and generates contextually relevant MCQ quizzes using large language models. It extracts key concepts during indexing and uses them to create targeted questions with plausible distractors and explanations.
Key features:
- Domain-based organisation (separate indices per subject)
- Concept extraction and question seed generation at index time
- Batch question generation for efficiency
- Multi-provider support: Local (Ollama) or cloud (OpenAI, Anthropic, Google Gemini)
- Hybrid configurations: Mix local and cloud providers for cost optimisation
- Automatic quiz metadata generation (title, description)
- Comprehensive explanations for each question
How It Works
This project adapts the ConQuer framework (Fu et al., 2025) for concept-based quiz generation using retrieval-augmented generation.
Documents → Indexing → Concept Extraction → Question Generation → Quiz
- Index: Documents are chunked and embedded into a vector store. Concepts and question seeds are extracted.
- Retrieve: Relevant content is retrieved based on the request.
- Generate: Questions with correct answers and 3 distractors are generated in batches per concept.
Installation
Requirements:
- Python 3.9+
- For local LLMs (free, recommended for indexing):
- Ollama with models:
gemma3:4b(general tasks)embeddinggemma(embeddings)- Optional:
qwen2-math:7b-instruct-q4_K_M(specialist question generation)
- Ollama with models:
- For cloud LLMs (optional, better quality):
- OpenAI API key (for GPT-4, GPT-4o, etc.)
- Anthropic API key (for Claude models)
- Google API key (for Gemini models)
Setup:
# Clone and install
git clone https://github.com/timothyckl/quizard-generator.git
cd quizard-generator
pip install -e .
# For local-only setup (Ollama)
ollama pull gemma3:4b
ollama pull qwen2-math:7b-instruct-q4_K_M
ollama pull embeddinggemma
# For cloud providers (optional)
pip install llama-index-llms-openai llama-index-embeddings-openai
# OR
pip install llama-index-llms-anthropic
# OR
pip install llama-index-llms-gemini
Usage
Method 1: CLI
Prepare your content:
Place documents in data/<domain>/:
data/
├── math/
│ ├── algebra.pdf
│ └── calculus.pdf
└── biology/
└── cell_biology.pdf
Supported formats: .txt, .pdf, .docx, .pptx
Index a domain:
# Full index (first time or rebuild)
quizard index --domain math
# Update index (only new files)
quizard index-update --domain math
# Refresh index (re-index modified files)
quizard index-refresh --domain math
Generate a quiz:
# Generate 5 questions (uses default Ollama provider)
quizard generate --domain math --num-questions 5
# With difficulty level
quizard generate --domain math --num-questions 10 --difficulty hard
# With custom instruction
quizard generate --domain math --num-questions 5 --instruction "Focus on algebra"
# Using cloud provider (requires config.yaml or environment variables)
export OPENAI_API_KEY=sk-...
quizard generate --domain math --num-questions 5 --config config.yaml
Check available domains:
quizard list-domains
See CLI Reference below for all commands.
Method 2: Python API
from llama_index.core import load_index_from_storage, StorageContext
from quizard_generator import (
QuizardConfig,
QuizardContext,
QuizGenerationPipeline,
Difficulty,
)
def main():
# configure - using Ollama (default)
config = QuizardConfig(
data_dir="data",
storage_dir="storage",
llm_provider="ollama", # or "openai", "anthropic", "google"
indexing_llm_model="gemma3:4b",
)
# use context manager for LlamaIndex Settings
with QuizardContext(config):
# load existing index (assumes domain is already indexed)
storage_context = StorageContext.from_defaults(
persist_dir="storage/math"
)
index = load_index_from_storage(storage_context)
# generate quiz
pipeline = QuizGenerationPipeline(index=index)
quiz = pipeline.generate_quiz(
instruction="Focus on algebra",
num_questions=5,
difficulty=Difficulty.MEDIUM,
)
# access quiz metadata
print(f"Quiz: {quiz.title}")
print(f"Description: {quiz.description}")
# use quiz
for idx, q in enumerate(quiz.questions, 1):
print(f"\n{idx}. {q.question}")
for i, opt in enumerate(q.options):
print(f" {chr(65+i)}. {opt}")
print(f" Answer: {chr(65 + q.correct_answer_index)}")
print(f" Explanation: {q.explanation}")
main()
Load configuration from YAML:
config = QuizardConfig.from_yaml("config.yaml")
See test_library_generation.py for a complete working example with indexing.
CLI Reference
# Available commands
quizard list-domains # Show all available domains
quizard index --domain NAME # Full rebuild index
quizard index-update --domain NAME # Index new files only
quizard index-refresh --domain NAME # Re-index modified files
quizard index-all # Index all domains sequentially
quizard generate --domain NAME [options] # Generate quiz
quizard validate-index --domain NAME # Check index health
# Generate options
--num-questions N # Number of questions (default: 5)
--difficulty LEVEL # easy, medium, hard (default: medium)
--instruction "text" # Custom instruction (optional)
# Global options
--verbose, -v # Enable verbose output
--config FILE # Path to YAML configuration file
Example Output
CLI Output:
Question 1 [Arithmetic progression patterns]
What is the common difference in the arithmetic progression: 5, 9, 13, 17, ...?
A. 4
B. 8
C. 12
D. 6
Correct Answer: A
Explanation: The common difference is found by subtracting consecutive terms: 9-5=4, 13-9=4, 17-13=4.
JSON Output (saved to generated_quiz_<domain>.json):
{
"metadata": {
"quiz_id": 1735545600,
"title": "Mathematics Practice Quiz",
"description": "A collection of fundamental maths questions covering arithmetic progressions.",
"tags": ["Arithmetic progression patterns", "Series and sequences"],
"difficulty": "medium",
"date_created": "30/12/2025"
},
"questions": [
{
"question": "What is the common difference in the arithmetic progression: 5, 9, 13, 17, ...?",
"type": "single_choice",
"options": ["4", "8", "12", "6"],
"answer": 0,
"explanation": "The common difference is found by subtracting consecutive terms: 9-5=4, 13-9=4, 17-13=4."
}
]
}
Configuration
Configuration can be specified via:
- YAML file (recommended for repeated use) - use
--config FILE - Programmatically (Python API usage)
- Default values (see
QuizardConfiginconfig.py)
Provider Configuration
Using Ollama (Local, Free):
llm_provider: ollama
embedding_provider: ollama
indexing_llm_model: gemma3:4b
indexing_embedding_model: embeddinggemma
generation_general_llm_model: gemma3:4b
generation_specialist_llm_model: qwen2-math:7b-instruct-q4_K_M
Using OpenAI:
llm_provider: openai
embedding_provider: openai
indexing_llm_model: gpt-4o-mini
indexing_embedding_model: text-embedding-3-small
generation_general_llm_model: gpt-4o-mini
generation_specialist_llm_model: gpt-4o
# Set API key via environment variable (recommended)
# export OPENAI_API_KEY=sk-...
# OR in config (less secure):
# openai_api_key: sk-...
Using Anthropic:
llm_provider: anthropic
generation_general_llm_model: claude-3-5-haiku-20241022
generation_specialist_llm_model: claude-3-5-sonnet-20241022
# Anthropic doesn't provide embeddings - use OpenAI or Ollama
embedding_provider: openai
indexing_embedding_model: text-embedding-3-small
# Set API keys via environment variables
# export ANTHROPIC_API_KEY=sk-ant-...
# export OPENAI_API_KEY=sk-... # for embeddings
Other configuration options:
chunk_size: 2048
max_concepts: 5
seeds_per_concept: 2
llm_request_timeout: 300.0
See config.yaml for comprehensive examples with all supported providers.
Project Structure
quizard-generator/
├── src/quizard_generator/
│ ├── commands/ # CLI command implementations
│ ├── extractors/ # Concept and seed extraction
│ ├── generators/ # Question generation
│ ├── indexing/ # Domain and index management
│ ├── knowledge/ # Retrieval and summarisation
│ ├── models/ # Data models (Quiz, Question, etc.)
│ ├── pipeline/ # Main orchestrator
│ ├── providers/ # LLM provider factory (NEW)
│ ├── cli.py # CLI entry point
│ ├── __main__.py # python -m support
│ ├── config.py # Configuration management
│ └── exceptions.py # Exception hierarchy
├── data/ # Your documents (by domain)
├── storage/ # Vector indices (by domain)
├── config.yaml # Configuration with provider examples
└── test_library_generation.py # API usage example
Architecture
Generation pipeline:
- Retrieval: Retrieve top-k nodes using semantic search
- Grouping: Group nodes by extracted concepts (from metadata)
- Distribution: Distribute questions equally across concepts
- Generation: For each concept:
- Summarise nodes once (1 LLM call)
- Generate batch of questions with Q+A+D integrated (1 LLM call)
- Shuffling: Shuffle answer options to avoid position bias
LLM call efficiency: ~13 calls for 10 questions
Key Design Decisions
- Batch generation: Questions for a concept are generated together to amortise summarisation cost
- Integrated Q+A+D: Single LLM call generates question, correct answer, and 3 distractors simultaneously
- Seed randomisation: Seeds are selected randomly with replacement (never depletes)
- Domain-agnostic: Works with any subject matter (not specialised for maths despite using qwen2-math)
Troubleshooting
"Failed to connect to Ollama"
- Ensure Ollama is running:
ollama serve - Check models are pulled:
ollama list
"No concepts found"
- Re-index with
--indexto extract concepts - Check indexing logs for extraction failures
Poor question quality
- Try different LLM models
- Increase chunk size for better context
- Ensure source documents are high quality
Limitations
- Question quality depends on source content quality
- Cloud providers require API keys and incur costs
- Limited to MCQ questions for now
- Supports English language only
Acknowledgements
This project adapts the ConQuer framework for concept-based quiz generation:
- Fu, Y., Wang, Z., Yang, L., Huo, M., & Dai, Z. (2025). ConQuer: A Framework for Concept-Based Quiz Generation. arXiv:2503.14662
Built with:
- LlamaIndex for RAG infrastructure
- Ollama for local LLM inference
- Pydantic for structured outputs
License
MIT License - see LICENSE file for details.
Copyright (c) 2025 Timothy Chia Kai Lun
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quizard_generator-0.1.0.tar.gz.
File metadata
- Download URL: quizard_generator-0.1.0.tar.gz
- Upload date:
- Size: 50.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3cd839ef3bb20d923a8ae4ec50c0cc87b7887481648d5056a6177eba21cf40c
|
|
| MD5 |
e85460cee9881b01b533991aacef7cdc
|
|
| BLAKE2b-256 |
8454fc60847638a6a9fcf60b3603fd2f546105258d1279fae70450f011515df7
|
File details
Details for the file quizard_generator-0.1.0-py3-none-any.whl.
File metadata
- Download URL: quizard_generator-0.1.0-py3-none-any.whl
- Upload date:
- Size: 59.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8a244ca3c3c4ff0cefa9e72ecfb906d0f713a6a14064a1eb55c4c69ed18eab4
|
|
| MD5 |
db1d3a8aa550395f497f882be97e62ac
|
|
| BLAKE2b-256 |
d53951bf2a31b74f5ed4ec9a07bd175ae735ff4bf88aa4e326940ca0ebadbba6
|