AI-powered quiz generator for regulatory, certification, and educational documentation
Project description
quiz-gen
AI-powered quiz generator for regulatory, certification, and educational documentation. Extract structured content from complex legal and technical documents to create comprehensive learning materials.
Features
- EUR-Lex Document Parser: Parse and structure European Union legal documents with full table of contents extraction
- Hierarchical Document Analysis: Automatically identify document structure including chapters, sections, articles, and recitals
- Intelligent Chunking: Extract meaningful content chunks at appropriate granularity levels (articles and recitals)
- Table of Contents Generation: Build complete document navigation structure with 3-level hierarchy
- Regulatory Document Support: Specialized parsing for aviation regulations, directives, and other technical documentation
Installation
pip install quiz-gen
Quick Start
Parsing EUR-Lex Documents
from quiz_gen import EURLexParser
# Parse a regulation document
url = "https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=OJ:L_202401689"
parser = EURLexParser(url=url)
chunks, toc = parser.parse()
# Access structured content
print(f"Extracted {len(chunks)} content chunks")
print(f"Document has {len(toc['sections'])} major sections")
# Save results
parser.save_chunks('output_chunks.json')
parser.save_toc('output_toc.json')
Document Structure
The parser extracts documents into a multi-level hierarchy:
Level 1: Major Sections
- Preamble
- Enacting Terms
Level 2/3: Structural Divisions
- Chapters
- Sections
Level 1/2/3/4: Content Elements
- Title
- Citation
- Recitals
- Articles
- Concluding formulas
- Annex
- Appendix
Working with Chunks
# Iterate through extracted chunks
for chunk in chunks:
print(f"{chunk.title}")
print(f"Type: {chunk.section_type.value}")
print(f"Number: {chunk.number}")
print(f"Content: {chunk.content[:200]}...")
print(f"Hierarchy: {' > '.join(chunk.hierarchy_path)}")
print()
Displaying Table of Contents
# Print formatted TOC
parser.print_toc()
# Output:
# PREAMBLE
# Citation
# Recital 1
# Recital 2
# ...
#
# ENACTING TERMS
# CHAPTER I - PRINCIPLES
# Article 1 - Subject matter and objectives
# Article 2 - Scope
Use Cases
Compliance and Legal
- Analyze regulatory requirements systematically
- Track changes across document versions
- Build searchable knowledge bases from legal texts
Documentation Processing
- Convert unstructured documents into structured data
- Build citation networks and cross-references
- Support automated document analysis workflows
Education and Training
- Generate study materials from regulatory documents
- Create structured learning paths for certification programs
- Extract key concepts for examination preparation
Supported Document Types
Currently supports:
- EUR-Lex HTML Documents: European Union regulations, directives, decisions
- Legislative Acts: Structured legal documents with formal hierarchies
Document Format Requirements
- Documents must use EUR-Lex HTML format
- Must contain
eli-subdivisionelements for proper structure identification - Supports multi-level hierarchies with chapters, sections, and articles
Advanced Usage
Custom Parsing Workflows
from quiz_gen import EURLexParser
parser = EURLexParser(url=document_url)
# Parse specific sections
parser._parse_preamble() # Extract citations and recitals
parser._parse_enacting_terms() # Extract chapters and articles
parser._parse_annexes() # Extract annexes
# Access intermediate results
toc = parser.toc # Full table of contents
chunks = parser.chunks # Content chunks only
Filtering Chunks by Type
from quiz_gen import SectionType
# Get only recitals
recitals = [c for c in chunks if c.section_type == SectionType.RECITAL]
# Get only articles
articles = [c for c in chunks if c.section_type == SectionType.ARTICLE]
# Filter by chapter
chapter_1_articles = [
c for c in articles
if 'CHAPTER I' in ' > '.join(c.hierarchy_path)
]
Accessing Metadata
for chunk in chunks:
# Access structured metadata
print(chunk.metadata) # {'id': 'art_1', 'subtitle': '...'}
# Navigate hierarchy
print(chunk.hierarchy_path) # ['CHAPTER I - PRINCIPLES', 'Article 1']
# Identify parent sections
print(chunk.parent_section)
Project Structure
quiz-gen/
├── src/
│ └── quiz_gen/
│ ├── parsers/
│ │ └── html/
│ │ └── eu_lex_parser.py
│ ├── models/
│ │ ├── chunk.py
│ │ ├── document.py
│ │ └── quiz.py
│ └── utils/
├── examples/
│ └── eu_lex_toc_chunks.py
├── tests/
├── data/
│ ├── processed/
│ └── raw/
└── docs/
Development
Setting up Development Environment
# Clone the repository
git clone https://github.com/yauheniya-ai/quiz-gen.git
cd quiz-gen
# Install with development dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run linting
ruff check .
black .
Project Structure
quiz-gen/
├── src/
│ └── quiz_gen/ # Module code here
│ ├── agents/
│ ├── parsers/
│ └── ...
├── examples/ # Example scripts
│ ├── easa_example.py
│ ├── test_article_47.py
│ └── run_workflow.py
├── pyproject.toml
└── .env
Contributing
Contributions are welcome! Please ensure:
- Code follows PEP 8 style guidelines
- All tests pass
- New features include appropriate tests
- Documentation is updated
API Reference
EURLexParser
Main parser class for EUR-Lex documents.
Methods:
parse()->tuple[List[RegulationChunk], Dict]: Parse document and return chunks and TOCfetch()->str: Fetch HTML content from URLsave_chunks(filepath: str): Save chunks to JSON filesave_toc(filepath: str): Save table of contents to JSON fileprint_toc(): Display formatted table of contents
RegulationChunk
Represents a parsed content chunk (article or recital).
Attributes:
section_type: Type of section (ARTICLE, RECITAL, etc.)number: Section number (e.g., "1", "42")title: Full title including subtitlecontent: Text contenthierarchy_path: List of parent sectionsmetadata: Additional structured data
SectionType
Enumeration of document section types.
Values:
PREAMBLE: Preamble sectionENACTING_TERMS: Main regulatory contentCITATION: Citation in preambleRECITAL: Recital in preambleCHAPTER: Chapter divisionSECTION: Section within chapterARTICLE: Article (main content unit)ANNEX: Annex section
Roadmap
Future enhancements planned:
- AI-powered quiz generation from extracted content
- Support for additional document formats (PDF, DOCX, PPTX)
- Multi-language support
- Question validation and quality metrics
- Integration with learning management systems
- Version comparison and diff analysis
License
This project is licensed under the MIT License. See the LICENSE file for details.
Citation
If you use this software in academic work, please cite:
Varabyova, Y. (2026). Quiz Gen AI: AI-powered quiz generator for regulatory documentation.
GitHub repository: https://github.com/yauheniya-ai/quiz-gen
Support
- Documentation: https://quiz-gen.readthedocs.io
- Issue Tracker: https://github.com/yauheniya-ai/quiz-gen/issues
Acknowledgments
Built with:
- BeautifulSoup4 for HTML parsing
- lxml for XML processing
- EUR-Lex for providing structured legal documents
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quiz_gen-0.2.6.tar.gz.
File metadata
- Download URL: quiz_gen-0.2.6.tar.gz
- Upload date:
- Size: 30.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b0f83b3b30d9844a72cc9f3b3700f1a8dc8d68c7b27a81443326007024f106e
|
|
| MD5 |
14e72a4a1e3641782b13d522deee35e1
|
|
| BLAKE2b-256 |
ce6d1ac2969ada27dc7aeb9c7d9ab1a85fee52fcf7c2f533e2de8706aabd9a6f
|
File details
Details for the file quiz_gen-0.2.6-py3-none-any.whl.
File metadata
- Download URL: quiz_gen-0.2.6-py3-none-any.whl
- Upload date:
- Size: 33.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1146bf3b25335809d254fca843c8b5060aa4f904640ea875cbf6d7974b6f451
|
|
| MD5 |
217301e6345dea77bb617df120018d6b
|
|
| BLAKE2b-256 |
3b2066b9483bf7826cbfd768513d1e9413ccf4f92b2641582a8fb1c79c29249f
|