Agent-safe PDF extraction SDK with structured output and quality reports.
Project description
Psyduck
Psyduck is a small Python SDK for turning PDFs into agent-ready structured documents.
It is intentionally SDK-only: import it from Python, run Psyduck().process(...), and inspect the returned document, profile, quality report, and exported artifacts.
Install
pip install psyduck
Development install:
pip install -e ".[dev]"
Python API
from psyduck import Psyduck
duck = Psyduck()
result = duck.process("report.pdf", goal="rag", mode="balanced", return_content=True)
if result.quality.needs_review:
for warning in result.quality.warnings:
print(warning.code, warning.message)
for block in result.document.blocks:
print(block.page, block.text)
Output Directory
output/report-<timestamp>-<id>/
run.json
profile.json
document.md
document.json
quality.json
tables/
assets/
pages/
SDK Contract
- Default extraction uses PyMuPDF.
process()always writes Markdown, JSON, profile, quality, and run metadata.return_content=Falsekeeps large document content out of the immediate result.load_result(output_dir)reloads a previous SDK run.- Custom extractors can be supplied through
extractor_registryand requested withprocess(..., extractors=[...]). tables="force"andocr="force"reportneeds_table_adapter/needs_ocr_adapterwarnings unless a caller-provided extractor handles that work.
Custom Extractors
from psyduck import Psyduck
from psyduck.extractors.base import ExtractorOutput
from psyduck.extractors.pymupdf import PyMuPDFExtractor
class MyExtractor:
def extract(self, file_path, pages=None):
return ExtractorOutput(source="my_extractor")
duck = Psyduck(
extractor_registry={
"pymupdf": PyMuPDFExtractor,
"my_extractor": lambda: MyExtractor(),
}
)
result = duck.process("report.pdf", extractors=["pymupdf", "my_extractor"])
Agent Policy
- Always call
process()before answering questions about PDF contents. - Check
result.qualitybefore using extracted content. - Use Markdown for summaries and JSON for page-aware answers.
- Do not claim complete extraction when
needs_reviewis true.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file psyduck-0.1.0.tar.gz.
File metadata
- Download URL: psyduck-0.1.0.tar.gz
- Upload date:
- Size: 32.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e1c2785695d1e5aa70f7151994e046fe5afee0e7ec6a392de34f71ae3f0bb06
|
|
| MD5 |
4737a9d125e2175ea08f057d4fe9d80f
|
|
| BLAKE2b-256 |
bd1ea2d110466db7c43aa82f17d235bb1fe59ffd20a741b91e82859c071a04f7
|
File details
Details for the file psyduck-0.1.0-py3-none-any.whl.
File metadata
- Download URL: psyduck-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb4ee174116603a02ff588c6a3655b453773569b939dae760776c1bca06f3913
|
|
| MD5 |
ea892c98ac48edf68c01ff2f45080b36
|
|
| BLAKE2b-256 |
1216c914a1b375838cbf150a9c9835ad9aafeac1facaed75b4bfb2f88e0f4f8b
|