Convert PDF files (text, scanned, mixed) into MCQ questions using AI
Project description
pdf2mcq
Convert PDF files — text PDFs, scanned books, mixed documents — into high-quality MCQ questions using AI.
Built on top of html2mcq's PDF pipeline, extracted as a standalone library focused purely on PDF-to-MCQ generation.
Features
- Smart PDF detection — automatically detects text PDFs, scanned PDFs, and mixed documents
- Text PDFs — fast extraction via PyMuPDF with chunking at sentence boundaries
- Scanned PDFs — renders pages as images → vision API OCR (or pytesseract fallback)
- Mixed PDFs — text pages via PyMuPDF + scanned pages via OCR, combined intelligently
- Multiple AI providers: OpenRouter, Anthropic, OpenAI, Ollama
- Auto model failover for MCQ generation
- CLI & Python API
Quick Start
CLI
# Single PDF
pdf2mcq --pdf-path textbook.pdf -n 10
# Multiple PDF URLs
pdf2mcq --pdf-url https://example.com/chapter1.pdf --pdf-url https://example.com/chapter2.pdf
# Scan a folder of PDFs
pdf2mcq --pdf-folder ./textbooks/
# Output as JSON
pdf2mcq --pdf-path notes.pdf -o questions.json --format json
Python API
from pdf2mcq import PDFMCQGenerator
gen = PDFMCQGenerator(
api_key="sk-or-v1-...",
provider="openrouter",
mcq_model="google/gemini-2.5-flash-lite",
)
# From local PDF
mcq = gen.from_pdf_paths("textbook.pdf", n=5)
print(mcq.to_pretty_str())
# From URL
mcq = gen.from_pdf_urls("https://example.com/notes.pdf", n=3)
print(mcq.to_json())
# Multiple PDFs
mcq = gen.from_pdf_paths(["chapter1.pdf", "chapter2.pdf", "chapter3.pdf"])
Custom Instructions
mcq = gen.from_pdf_paths(
"lecture-notes.pdf",
n=10,
difficulty_mix="50% easy, 50% hard",
focus_topics=["machine learning", "neural networks"],
custom_instructions="Focus on mathematical derivations",
)
Auto Model Selection
gen = PDFMCQGenerator(
api_key="sk-or-v1-...",
mcq_model="auto",
mcq_model_list=[
"nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free",
"google/gemma-4-31b-it:free",
],
)
Environment Variables
| Variable | Purpose |
|---|---|
OPENROUTER_API_KEY |
Default API key for OpenRouter |
ANTHROPIC_API_KEY |
API key for Anthropic |
OPENAI_API_KEY |
API key for OpenAI |
PDF2MCQ_MCQ_MODELS |
Comma-separated MCQ model priority list for mcq_model="auto" |
PDF2MCQ_OCR_MODELS |
Comma-separated OCR model priority list for scanned PDFs |
Output Format
# Pretty-print
print(mcq.to_pretty_str())
# JSON
print(mcq.to_json())
# {
# "total_exam_time": 20,
# "questions": [
# {
# "question_html": "What is gradient descent?",
# "options": ["...", "...", "...", "..."],
# "answers": [0],
# "multi": false,
# "marks": 1.0,
# "negative_marks": 0.25,
# "difficulty": "easy",
# "explaination": "..."
# }
# ]
# }
Installation
pip install pdf2mcq
Requires PyMuPDF (fitz) — installed automatically as a dependency.
For scanned PDF OCR, also install Tesseract.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf2mcq-1.2.1.tar.gz.
File metadata
- Download URL: pdf2mcq-1.2.1.tar.gz
- Upload date:
- Size: 23.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
999472dcf7350fb3abd50ef27febe7b9cadf0d3510fb827967ca34fa32b9d3ec
|
|
| MD5 |
3f1c8eb316dbaee70f0e768fabf796d1
|
|
| BLAKE2b-256 |
23a88d018907caf40d9a57bd04b44c9846d584cecf51fe7e7b4b7a3b4e65ec10
|
File details
Details for the file pdf2mcq-1.2.1-py3-none-any.whl.
File metadata
- Download URL: pdf2mcq-1.2.1-py3-none-any.whl
- Upload date:
- Size: 22.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1bd5f5275429d94ff4f3f55bebc343102627c3e9ac3b052a0d7653dbdb30d1d
|
|
| MD5 |
d317dffe5080c0fd425dab3593bf4a7c
|
|
| BLAKE2b-256 |
eab73a1146b2bbb4f0c42f1fda5af685cb110c5e645d1475ce18a27b94c6ab6c
|