A domain-based quiz generation system using LlamaIndex and LLMs

These details have not been verified by PyPI

Project links

Project description

Quizard Generator

Generate multiple-choice quizzes from documents using local or cloud LLMs via CLI or Python API.

What It Does

Quizard Generator indexes your documents (PDFs, text files, etc.) and generates contextually relevant MCQ quizzes using large language models. It extracts key concepts during indexing and uses them to create targeted questions with plausible distractors and explanations.

Key features:

Domain-based organisation (separate indices per subject)
Concept extraction and question seed generation at index time
Batch question generation for efficiency
Multi-provider support: Local (Ollama) or cloud (OpenAI, Anthropic, Google Gemini)
Hybrid configurations: Mix local and cloud providers for cost optimisation
Automatic quiz metadata generation (title, description)
Comprehensive explanations for each question

How It Works

This project adapts the ConQuer framework (Fu et al., 2025) for concept-based quiz generation using retrieval-augmented generation.

Documents → Indexing → Concept Extraction → Question Generation → Quiz

Index: Documents are chunked and embedded into a vector store. Concepts and question seeds are extracted.
Retrieve: Relevant content is retrieved based on the request.
Generate: Questions with correct answers and 3 distractors are generated in batches per concept.

Installation

Requirements:

Python 3.9+
For local LLMs (free, recommended for indexing):
- Ollama with models:
  - gemma3:4b (general tasks)
  - embeddinggemma (embeddings)
  - Optional: qwen2-math:7b-instruct-q4_K_M (specialist question generation)
For cloud LLMs (optional, better quality):
- OpenAI API key (for GPT-4, GPT-4o, etc.)
- Anthropic API key (for Claude models)
- Google API key (for Gemini models)

Setup:

# Clone and install
git clone https://github.com/timothyckl/quizard-generator.git
cd quizard-generator
pip install -e .

# For local-only setup (Ollama)
ollama pull gemma3:4b
ollama pull qwen2-math:7b-instruct-q4_K_M
ollama pull embeddinggemma

# For cloud providers (optional)
pip install llama-index-llms-openai llama-index-embeddings-openai
# OR
pip install llama-index-llms-anthropic
# OR
pip install llama-index-llms-gemini

Usage

Method 1: CLI

Prepare your content:

Place documents in data/<domain>/:

data/
├── math/
│   ├── algebra.pdf
│   └── calculus.pdf
└── biology/
    └── cell_biology.pdf

Supported formats: .txt, .pdf, .docx, .pptx

Index a domain:

# Full index (first time or rebuild)
quizard index --domain math

# Update index (only new files)
quizard index-update --domain math

# Refresh index (re-index modified files)
quizard index-refresh --domain math

Generate a quiz:

# Generate 5 questions (uses default Ollama provider)
quizard generate --domain math --num-questions 5

# With difficulty level
quizard generate --domain math --num-questions 10 --difficulty hard

# With custom instruction
quizard generate --domain math --num-questions 5 --instruction "Focus on algebra"

# Using cloud provider (requires config.yaml or environment variables)
export OPENAI_API_KEY=sk-...
quizard generate --domain math --num-questions 5 --config config.yaml

Check available domains:

quizard list-domains

See CLI Reference below for all commands.

Method 2: Python API

from llama_index.core import load_index_from_storage, StorageContext
from quizard_generator import (
    QuizardConfig,
    QuizardContext,
    QuizGenerationPipeline,
    Difficulty,
)

def main():
    # configure - using Ollama (default)
    config = QuizardConfig(
        data_dir="data",
        storage_dir="storage",
        llm_provider="ollama",  # or "openai", "anthropic", "google"
        indexing_llm_model="gemma3:4b",
    )

    # use context manager for LlamaIndex Settings
    with QuizardContext(config):
        # load existing index (assumes domain is already indexed)
        storage_context = StorageContext.from_defaults(
            persist_dir="storage/math"
        )
        index = load_index_from_storage(storage_context)

        # generate quiz
        pipeline = QuizGenerationPipeline(index=index)
        quiz = pipeline.generate_quiz(
            instruction="Focus on algebra",
            num_questions=5,
            difficulty=Difficulty.MEDIUM,
        )

        # access quiz metadata
        print(f"Quiz: {quiz.title}")
        print(f"Description: {quiz.description}")

        # use quiz
        for idx, q in enumerate(quiz.questions, 1):
            print(f"\n{idx}. {q.question}")
            for i, opt in enumerate(q.options):
                print(f"   {chr(65+i)}. {opt}")
            print(f"   Answer: {chr(65 + q.correct_answer_index)}")
            print(f"   Explanation: {q.explanation}")

main()

Load configuration from YAML:

config = QuizardConfig.from_yaml("config.yaml")

See test_library_generation.py for a complete working example with indexing.

CLI Reference

# Available commands
quizard list-domains                    # Show all available domains
quizard index --domain NAME             # Full rebuild index
quizard index-update --domain NAME      # Index new files only
quizard index-refresh --domain NAME     # Re-index modified files
quizard index-all                       # Index all domains sequentially
quizard generate --domain NAME [options]  # Generate quiz
quizard validate-index --domain NAME    # Check index health

# Generate options
--num-questions N       # Number of questions (default: 5)
--difficulty LEVEL      # easy, medium, hard (default: medium)
--instruction "text"    # Custom instruction (optional)

# Global options
--verbose, -v          # Enable verbose output
--config FILE          # Path to YAML configuration file

Example Output

CLI Output:

Question 1 [Arithmetic progression patterns]
What is the common difference in the arithmetic progression: 5, 9, 13, 17, ...?

  A. 4
  B. 8
  C. 12
  D. 6

  Correct Answer: A
  Explanation: The common difference is found by subtracting consecutive terms: 9-5=4, 13-9=4, 17-13=4.

JSON Output (saved to generated_quiz_<domain>.json):

{
  "metadata": {
    "quiz_id": 1735545600,
    "title": "Mathematics Practice Quiz",
    "description": "A collection of fundamental maths questions covering arithmetic progressions.",
    "tags": ["Arithmetic progression patterns", "Series and sequences"],
    "difficulty": "medium",
    "date_created": "30/12/2025"
  },
  "questions": [
    {
      "question": "What is the common difference in the arithmetic progression: 5, 9, 13, 17, ...?",
      "type": "single_choice",
      "options": ["4", "8", "12", "6"],
      "answer": 0,
      "explanation": "The common difference is found by subtracting consecutive terms: 9-5=4, 13-9=4, 17-13=4."
    }
  ]
}

Configuration

Configuration can be specified via:

YAML file (recommended for repeated use) - use --config FILE
Programmatically (Python API usage)
Default values (see QuizardConfig in config.py)

Provider Configuration

Using Ollama (Local, Free):

llm_provider: ollama
embedding_provider: ollama
indexing_llm_model: gemma3:4b
indexing_embedding_model: embeddinggemma
generation_general_llm_model: gemma3:4b
generation_specialist_llm_model: qwen2-math:7b-instruct-q4_K_M

Using OpenAI:

llm_provider: openai
embedding_provider: openai
indexing_llm_model: gpt-4o-mini
indexing_embedding_model: text-embedding-3-small
generation_general_llm_model: gpt-4o-mini
generation_specialist_llm_model: gpt-4o
# Set API key via environment variable (recommended)
# export OPENAI_API_KEY=sk-...
# OR in config (less secure):
# openai_api_key: sk-...

Using Anthropic:

llm_provider: anthropic
generation_general_llm_model: claude-3-5-haiku-20241022
generation_specialist_llm_model: claude-3-5-sonnet-20241022

# Anthropic doesn't provide embeddings - use OpenAI or Ollama
embedding_provider: openai
indexing_embedding_model: text-embedding-3-small
# Set API keys via environment variables
# export ANTHROPIC_API_KEY=sk-ant-...
# export OPENAI_API_KEY=sk-...  # for embeddings

Other configuration options:

chunk_size: 2048
max_concepts: 5
seeds_per_concept: 2
llm_request_timeout: 300.0

See config.yaml for comprehensive examples with all supported providers.

Project Structure

quizard-generator/
├── src/quizard_generator/
│   ├── commands/            # CLI command implementations
│   ├── extractors/          # Concept and seed extraction
│   ├── generators/          # Question generation
│   ├── indexing/            # Domain and index management
│   ├── knowledge/           # Retrieval and summarisation
│   ├── models/              # Data models (Quiz, Question, etc.)
│   ├── pipeline/            # Main orchestrator
│   ├── providers/           # LLM provider factory (NEW)
│   ├── cli.py               # CLI entry point
│   ├── __main__.py          # python -m support
│   ├── config.py            # Configuration management
│   └── exceptions.py        # Exception hierarchy
├── data/                    # Your documents (by domain)
├── storage/                 # Vector indices (by domain)
├── config.yaml              # Configuration with provider examples
└── test_library_generation.py  # API usage example

Architecture

Generation pipeline:

Retrieval: Retrieve top-k nodes using semantic search
Grouping: Group nodes by extracted concepts (from metadata)
Distribution: Distribute questions equally across concepts
Generation: For each concept:
- Summarise nodes once (1 LLM call)
- Generate batch of questions with Q+A+D integrated (1 LLM call)
Shuffling: Shuffle answer options to avoid position bias

LLM call efficiency: ~13 calls for 10 questions

Key Design Decisions

Batch generation: Questions for a concept are generated together to amortise summarisation cost
Integrated Q+A+D: Single LLM call generates question, correct answer, and 3 distractors simultaneously
Seed randomisation: Seeds are selected randomly with replacement (never depletes)
Domain-agnostic: Works with any subject matter (not specialised for maths despite using qwen2-math)

Troubleshooting

"Failed to connect to Ollama"

Ensure Ollama is running: ollama serve
Check models are pulled: ollama list

"No concepts found"

Re-index with --index to extract concepts
Check indexing logs for extraction failures

Poor question quality

Try different LLM models
Increase chunk size for better context
Ensure source documents are high quality

Limitations

Question quality depends on source content quality
Cloud providers require API keys and incur costs
Limited to MCQ questions for now
Supports English language only

Acknowledgements

This project adapts the ConQuer framework for concept-based quiz generation:

Fu, Y., Wang, Z., Yang, L., Huo, M., & Dai, Z. (2025). ConQuer: A Framework for Concept-Based Quiz Generation. arXiv:2503.14662

Built with:

LlamaIndex for RAG infrastructure
Ollama for local LLM inference
Pydantic for structured outputs

License

MIT License - see LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jan 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quizard_generator-0.1.0.tar.gz (50.9 kB view details)

Uploaded Jan 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

quizard_generator-0.1.0-py3-none-any.whl (59.7 kB view details)

Uploaded Jan 8, 2026 Python 3

File details

Details for the file quizard_generator-0.1.0.tar.gz.

File metadata

Download URL: quizard_generator-0.1.0.tar.gz
Upload date: Jan 8, 2026
Size: 50.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for quizard_generator-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e3cd839ef3bb20d923a8ae4ec50c0cc87b7887481648d5056a6177eba21cf40c`
MD5	`e85460cee9881b01b533991aacef7cdc`
BLAKE2b-256	`8454fc60847638a6a9fcf60b3603fd2f546105258d1279fae70450f011515df7`

See more details on using hashes here.

File details

Details for the file quizard_generator-0.1.0-py3-none-any.whl.

File metadata

Download URL: quizard_generator-0.1.0-py3-none-any.whl
Upload date: Jan 8, 2026
Size: 59.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for quizard_generator-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f8a244ca3c3c4ff0cefa9e72ecfb906d0f713a6a14064a1eb55c4c69ed18eab4`
MD5	`db1d3a8aa550395f497f882be97e62ac`
BLAKE2b-256	`d53951bf2a31b74f5ed4ec9a07bd175ae735ff4bf88aa4e326940ca0ebadbba6`

See more details on using hashes here.

quizard-generator 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Quizard Generator

What It Does

How It Works

Installation

Usage

Method 1: CLI

Method 2: Python API

CLI Reference

Example Output

Configuration

Provider Configuration

Project Structure

Architecture

Key Design Decisions

Troubleshooting

Limitations

Acknowledgements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes