Skip to main content

AI-powered quiz generator for regulatory, certification, and educational documentation

Project description

quiz-gen

Python 3.10+ License: MIT PyPI version Tests Coverage GitHub last commit Downloads

AI-powered quiz generator for regulatory, certification, and educational documentation. Extract structured content from complex legal and technical documents to create comprehensive learning materials.

Features

  • Multi-Agent Quiz Generation: Generate, validate, and judge questions using configurable providers/models
  • EUR-Lex Document Parser: Parse and structure EU legal documents with full table of contents extraction
  • Hierarchical Document Analysis: Identify structure including chapters, sections, articles, recitals, annexes, and appendices
  • Intelligent Chunking: Extract meaningful content chunks for articles, recitals, annexes, and appendices

Installation

pip install quiz-gen

Quick Start

Parsing EUR-Lex Documents

from quiz_gen import EURLexParser

# Parse a regulation document
url = "https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=OJ:L_202401689"
parser = EURLexParser(url=url)
chunks, toc = parser.parse()

# Access structured content
print(f"Extracted {len(chunks)} content chunks")
print(f"Document has {len(toc['sections'])} major sections")

# Save results
parser.save_chunks('output_chunks.json')
parser.save_toc('output_toc.json')

Working with Chunks

# Iterate through extracted chunks
for chunk in chunks:
    print(f"{chunk.title}")
    print(f"Type: {chunk.section_type.value}")
    print(f"Number: {chunk.number}")
    print(f"Content: {chunk.content[:200]}...")
    print(f"Hierarchy: {' > '.join(chunk.hierarchy_path)}")
    print()

Displaying Table of Contents

# Print formatted TOC
parser.print_toc()

# Output:
# PREAMBLE
#   Citation 
#   Recital 1
#   Recital 2
#   ...
# 
# ENACTING TERMS
#   CHAPTER I - PRINCIPLES
#     Article 1 - Subject matter and objectives
#     Article 2 - Scope

Multi-Agent Quiz Generation

Quiz generation uses four specialized agents (conceptual, practical, validator, judge). Providers are configurable per agent, with supported providers: Anthropic, Google, Mistral, and OpenAI. Any text-generation model name from these providers can be passed directly. The package relies on provider defaults for generation parameters.

Multi-Agent Architecture and Configuration

Multi-Agent Architecture and Configuration

from quiz_gen.agents.workflow import QuizGenerationWorkflow
from quiz_gen.agents.config import AgentConfig

config = AgentConfig(
    conceptual_provider="openai",
    practical_provider="anthropic",
    validator_provider="google",
    judge_provider="mistral",
    conceptual_model="gpt-4o",
    practical_model="claude-sonnet-4-20250514",
    validator_model="gemini-2.5-flash",
    judge_model="mistral-large-latest",
)

workflow = QuizGenerationWorkflow(config)
result = workflow.run(chunk)

Advanced Usage

Custom Parsing Workflows

from quiz_gen import EURLexParser

parser = EURLexParser(url=document_url)

# Parse specific sections
parser._parse_preamble()  # Extract citations and recitals
parser._parse_enacting_terms()  # Extract chapters and articles
parser._parse_annexes()  # Extract annexes

# Access intermediate results
toc = parser.toc  # Full table of contents
chunks = parser.chunks  # Content chunks only

Filtering Chunks by Type

from quiz_gen import SectionType

# Get only recitals
recitals = [c for c in chunks if c.section_type == SectionType.RECITAL]

# Get only articles
articles = [c for c in chunks if c.section_type == SectionType.ARTICLE]

# Filter by chapter
chapter_1_articles = [
    c for c in articles 
    if 'CHAPTER I' in ' > '.join(c.hierarchy_path)
]

Accessing Metadata

for chunk in chunks:
    # Access structured metadata
    print(chunk.metadata)  # {'id': 'art_1', 'subtitle': '...'}
    
    # Navigate hierarchy
    print(chunk.hierarchy_path)  # ['CHAPTER I - PRINCIPLES', 'Article 1']
    
    # Identify parent sections
    print(chunk.parent_section)

Development

Setting up Development Environment

# Clone the repository
git clone https://github.com/yauheniya-ai/quiz-gen.git
cd quiz-gen

# Install with development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run linting
ruff check .
black .

Project Structure

quiz-gen/
├── data/             
│   ├── raw/
│   ├── processed/
│   └── quizzes/
├── src/
│   └── quiz_gen/          # Module code here
│       ├── agents/
│       ├── parsers/
│       └── ...
├── examples/              # Example scripts
│   ├── eur_lex_html_url.py
│   └── quiz_gen_multi_model.py
├── pyproject.toml
└── .env

API Reference

EURLexParser

Main parser class for EUR-Lex documents.

Methods:

  • parse() -> tuple[List[RegulationChunk], Dict]: Parse document and return chunks and TOC
  • fetch() -> str: Fetch HTML content from URL
  • save_chunks(filepath: str): Save chunks to JSON file
  • save_toc(filepath: str): Save table of contents to JSON file
  • print_toc(): Display formatted table of contents

RegulationChunk

Represents a parsed content chunk (article or recital).

Attributes:

  • section_type: Type of section (ARTICLE, RECITAL, etc.)
  • number: Section number (e.g., "1", "42")
  • title: Full title including subtitle
  • content: Text content
  • hierarchy_path: List of parent sections
  • metadata: Additional structured data

SectionType

Enumeration of document section types.

Values:

  • PREAMBLE: Preamble section
  • ENACTING_TERMS: Main regulatory content
  • CITATION: Citation in preamble
  • RECITAL: Recital in preamble
  • CHAPTER: Chapter division
  • SECTION: Section within chapter
  • ARTICLE: Article (main content unit)
  • ANNEX: Annex section

Use Cases

Compliance and Legal

  • Analyze regulatory requirements systematically
  • Track changes across document versions
  • Build searchable knowledge bases from legal texts

Documentation Processing

  • Convert unstructured documents into structured data
  • Build citation networks and cross-references
  • Support automated document analysis workflows

Education and Training

  • Generate study materials from regulatory documents
  • Create structured learning paths for certification programs
  • Extract key concepts for examination preparation

Supported Document Types

Currently supports:

  • EUR-Lex HTML Documents: European Union regulations, directives, decisions
  • Legislative Acts: Structured legal documents with formal hierarchies

Document Format Requirements

  • Documents must use EUR-Lex HTML format
  • Must contain eli-subdivision elements for proper structure identification
  • Supports multi-level hierarchies with chapters, sections, and articles

Roadmap

Future enhancements planned:

  • Support for additional document formats (PDF, DOCX, PPTX)
  • Multi-language support
  • Integration with learning management systems

License

This project is licensed under the MIT License. See the LICENSE file for details.

Citation

If you use this software in academic work, please cite:

Varabyova, Y. (2026). Quiz Gen AI: AI-powered quiz generator for professional certification.
GitHub repository: https://github.com/yauheniya-ai/quiz-gen

Support

Contributing

Contributions are welcome! Please ensure:

  1. Code follows PEP 8 style guidelines
  2. All tests pass
  3. New features include appropriate tests
  4. Documentation is updated

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quiz_gen-0.3.7.tar.gz (33.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quiz_gen-0.3.7-py3-none-any.whl (38.2 kB view details)

Uploaded Python 3

File details

Details for the file quiz_gen-0.3.7.tar.gz.

File metadata

  • Download URL: quiz_gen-0.3.7.tar.gz
  • Upload date:
  • Size: 33.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for quiz_gen-0.3.7.tar.gz
Algorithm Hash digest
SHA256 603e761c8dcf2f0d43979fb87b515bf18d424e7b20f515363a55e14de9fd1c73
MD5 d824bbac6ba78ab2e5face12a1e25240
BLAKE2b-256 10c5b506609f33f9881cb9a031bb9a849cbcd6be236f5ef7878f7e8ef0b26d76

See more details on using hashes here.

File details

Details for the file quiz_gen-0.3.7-py3-none-any.whl.

File metadata

  • Download URL: quiz_gen-0.3.7-py3-none-any.whl
  • Upload date:
  • Size: 38.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for quiz_gen-0.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 28765f4fac9a56dc6abd10e26a80a47fe41ccb30d83763ce4c9a8ef504c4bec4
MD5 d0cd6f40cb79d343da2f456733ed4ee8
BLAKE2b-256 12f69633a159a1ecccbb24a790e4dad9583019386b9bc8a06fbcedfc3540b33e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page