Skip to main content

Open-source GWG 2022 conformance assay for PDF preflight engines.

Project description

AssayPDF

Open-source GWG 2022 conformance assay for PDF preflight engines.

CI Python 3.12+ License: AGPL v3 Spec: GWG 2022

What this is

AssayPDF is a benchmark kit that:

  1. Generates a deterministic PDF test corpus (~175 files) derived from the Ghent Workgroup 2022 Specification — every file targets exactly one of the 39 rules in the spec, across all 23 GWG 2022 variants.
  2. Runs that corpus against any preflight engine — lintPDF, Enfocus PitStop Server, callas pdfToolbox — through a uniform harness.
  3. Scores TP / FP / FN / TN per rule, per variant, per engine, and produces reproducible markdown + HTML accuracy reports.

Why this exists

The GWG 2015 Compliancy Test Suite is gated to GWG vendor members. The GWG 2022 spec ships with no public test corpus at all. AssayPDF closes that gap so anyone can self-benchmark a preflight engine without paying for vendor membership.

It also doubles as the credibility layer for lintPDF (Think Neverland's open-source PDF preflight engine, hosted at lintpdf.com) — published accuracy comparisons against incumbents that none of those incumbents publish themselves.

Quick start

git clone https://github.com/printwithsynergy/assay-pdf.git
cd assay-pdf
uv sync --all-extras                                      # install deps + Python 3.12
uv run assay fetch                                        # download GWG vendor assets (~183 MB)
uv run assay generate                                     # build the 175-file PDF/X-4 corpus
uv run assay validate                                     # verify every PDF passes verapdf
uv run assay benchmark --engine pdftoolbox --profile sheetcmyk-cmyk
uv run assay report --format md > REPORT.md

Detailed docs:

What you get

corpus/
├── manifest.json          # every file's expected outcome, rule mapping, sha256
├── positive/              # 23 PDFs — one per GWG 2022 variant — pass every applicable rule
└── negative/              # 152 PDFs — each targeting one rule's failure mode cleanly

Every PDF passes verapdf PDF/X-4 validation (or has documented exception in the manifest). Every PDF is generated deterministically — same code, same seed, byte-identical output.

Coverage

Spec area Rule IDs Negatives
Page geometry R0001–R0006 13
Overprint R0007–R0013 7
Fonts R0014 3
Black, registration R0015–R0019 6
Spot colors R0020–R0024 7
Total ink coverage R0025–R0026 6
Color space binding R0027–R0030 9
Image resolution R0031–R0033 6
Optional content R0034, R0036 3
Output intent R0035 3
Sign/display scaling R0037 2
Processing steps R1001–R1002 2
Boundary stress (v0.1.0) (across all rules) +85

Plus 23 positive baselines, one per variant.

Engine support

Engine Status Notes
callas pdfToolbox working Trial license; CLI invocation
Enfocus PitStop Server working Trial license; CLI invocation
lintPDF working HTTP API at lintpdf.com; runner wired

Adding an engine = implementing one Runner subclass and a rule_maps/<engine>.json mapping. See docs/methodology.md.

Reproducibility

This is not a one-off study. Every claim AssayPDF makes is reproducible:

  • Spec assets fetched from GWG canonical URLs with SHA-256 verification (vendor/checksums.json)
  • Corpus generated deterministically from a seed; manifest records expected SHA-256 per file
  • CI runs assay validate on every commit
  • A weekly cron job verifies all upstream URLs are still alive

Anyone with the same engine versions and licenses can run AssayPDF and reproduce the published accuracy numbers byte-for-byte.

Legal posture

AssayPDF never redistributes GWG copyrighted materials. Vendor assets (GOS 5.0 suites, processing-steps test suite) are fetched from the official GWG endpoints. The corpus AssayPDF generates is original work derived from spec rules, not copies of the GWG 2015 test suite.

See docs/legal-positioning.md for the comparative-advertising / nominative-fair-use stance.

Contributing

See CONTRIBUTING.md. New rule generators, new engine runners, and new boundary-case test files are all welcome.

By participating in this project you agree to abide by the Code of Conduct.

Support and security

  • Usage questions or bug reports: see SUPPORT.md.
  • Security vulnerabilities: see SECURITY.md — please do not open a public issue.

License

AGPL-3.0-or-later — see LICENSE.

ICC profiles bundled under src/assay_pdf/generator/icc/ are redistributed under their respective upstream terms; see src/assay_pdf/generator/icc/README.md.

Sister projects

Print With Synergy's PDF tooling family:

  • lint-pdf — open-source PDF preflight engine (500+ checks). This is the primary engine AssayPDF benchmarks. Hosted at lintpdf.com.
  • lens-pdf — open-source embeddable PDF viewer (React/TypeScript). Renders PDFs with preflight overlays.
  • codex-pdf — central PDF extraction engine; source of truth for fonts, images, color spaces, annotations, and findings.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

assay_pdf-0.1.0b3.tar.gz (353.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

assay_pdf-0.1.0b3-py3-none-any.whl (83.0 kB view details)

Uploaded Python 3

File details

Details for the file assay_pdf-0.1.0b3.tar.gz.

File metadata

  • Download URL: assay_pdf-0.1.0b3.tar.gz
  • Upload date:
  • Size: 353.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for assay_pdf-0.1.0b3.tar.gz
Algorithm Hash digest
SHA256 a20c32ed2fe9bec1ded8d9eba57f3ef02a7ac664bcb9097cecb4dd08a3764f0b
MD5 a595a7b958ff11585716deef80013981
BLAKE2b-256 c6a467961230c35a9ee5d360b61e7752eafa81dee7aeaa160305448db0c2a4fa

See more details on using hashes here.

File details

Details for the file assay_pdf-0.1.0b3-py3-none-any.whl.

File metadata

File hashes

Hashes for assay_pdf-0.1.0b3-py3-none-any.whl
Algorithm Hash digest
SHA256 c91ef3123af17dbd16da8d0cb0c248116204710091397883b1afb255efe5e22c
MD5 8ced5047d48fc72294225cf4d9de1e83
BLAKE2b-256 193312afeb99977aac1797c247dfd3d63134b3aae4a503cc44c60ea05262c4c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page