Authoritative, versioned PDF facts contract for Think Neverland tools. v1.21.x adds codex_pdf.errors (RFC 7807 Problem Details) as the cross-stack HTTP error envelope.
Project description
title: "Overview" description: "Authoritative read-only PDF facts + render engine for the Print with Synergy tool family. Versioned contract, schema-validated output, deployed as three services." group: "Getting started" order: 1 slug: "overview"
codexPDF
codexPDF is the authoritative, read-only PDF facts + render reference
for the Print with Synergy tool family.
Other engines consult codexPDF for canonical document facts instead
of re-parsing PDFs independently. The contract is versioned and
schema-validated.
Status
codex-pdf 1.39.0. Current surface includes:
- Python package (
codex_pdf) with typedpydanticmodels. - CLI (
codex-pdf extract|schema|contract|validate|probe|parity|render|serve). - HTTP API (
/v1/extract,/v1/probe,/v1/extract/stream,/v1/render/{page,separations,heatmap,layer},/v1/coverage,/v1/plates/extract,/v1/sample/{color,density},/v1/walk/{type4,content-stream},/v1/color/{resolve,match-pantone,neutral-density,inkbook},/v1/geom/{tile,intersect,union,difference,offset},/v1/retention/delete). - Unified input (1.36.0).
POST /v1/extract+POST /v1/probeaccept PDFs, Adobe Illustrator.aifiles (sliced to their embedded PDF; legacy.aiconverts via Ghostscript when available), EPS / composite PostScript (.eps/.ps— always normalized to PDF via Ghostscript; clean 422 when gs is absent), and raster plate / tooling files (1-bit TIFF + Esko LEN, single or repeatable set; multi-page TIFF where each page is a separation; TIFF/IT ISO 12639; DCS / DCS 2.0 + copydot; CIP3 PPF ink-coverage). Every input returns a normalCodexDocument—summary.source_formatrecordspdf/ai/eps/plate_tiff/plate_len/plate_tiff_it/plate_dcs/plate_cip3/ the structural die familiescff2/ddes/dxf(with a human-friendlysummary.source_format_label, e.g."1-bit TIFF plate"), and plate inputs additionally carrysummary.{ink_coverage,embellishments,plate_set}. On the default full PDF extract (Ghostscript present)summary.ink_coveragealso carries compact base64 PNG previews — atac_heatmap_pngplus a per-separationpreview_png(downsampled, size-capped; disable withCODEX_EXTRACT_INK_COVERAGE=false). Plate coverage / screen ruling / min-dot are computed directly from the rasters (gs-free, deterministic). Structural die / CAD files (CFF2, DDES, DXF) are parsed gs-free intosummary.dieline(authoritative die size + cut/crease/score/perf candidates). Encrypted.lenx, proprietary Scitex CT/LW (.ct/.lwCEPS), and proprietary binary structural CAD (.ardArtiosCAD /.dwgAutoCAD) are detected and rejected cleanly with a precise remediation note.
Supported input formats
Every accepted input flows through POST /v1/extract + POST /v1/probe and
returns a normal CodexDocument; summary.source_format records the family.
Tier-1 = fully decoded; Tier-2 = detected and rejected with a remediation note
(never a silent zero-finding result).
| Format | Ext | Tier | source_format |
Notes |
|---|---|---|---|---|
.pdf |
1 | pdf |
the canonical path | |
| Adobe Illustrator | .ai |
1 | ai |
embedded-PDF slice; legacy → Ghostscript |
| EPS / composite PostScript | .eps/.ps |
1 | eps |
always normalized to PDF via Ghostscript |
| 1-bit TIFF / Esko LEN | .tif/.tiff/.len |
1 | plate_tiff / plate_len |
gs-free raster facts |
| Multi-page 1-bit TIFF | .tif |
1 | plate_tiff |
one separation per page |
| TIFF/IT (ISO 12639) | .ct/.lw/.fp/.tif |
1 | plate_tiff_it |
contone CT split into CMYK channels |
| DCS / DCS 2.0 + copydot | .dcs/.eps |
1 | plate_dcs |
embedded/sidecar separations |
| CIP3 PPF | .ppf |
1 | plate_cip3 |
per-separation ink coverage + sheet geometry |
| CFF2 / CF2 (structural die) | .cf2/.cff2 |
1 | cff2 |
ASCII vector die — cut/crease/score/perf geometry + die size |
| DDES2 / DDES3 (structural die) | .dd3/.ds2/.ddes |
1 | ddes |
ASCII die-cutting exchange — line-type → subtype + die size |
| DXF (structural / CAD) | .dxf |
1 | dxf |
ASCII ENTITIES (LINE/LWPOLYLINE/POLYLINE/ARC) grouped by layer → subtype |
| Encrypted Esko LEN | .lenx |
2 | — | closed CDI-Crystal — clean 422 |
| Scitex CT / LW (CEPS) | .ct/.lw |
2 | — | proprietary — clean 415, convert to TIFF/IT |
| ArtiosCAD ARD | .ard |
2 | — | proprietary Esko binary — clean 415, export CFF2/DDES/DXF |
| AutoCAD DWG | .dwg |
2 | — | proprietary binary CAD — clean 415, export ASCII DXF |
DCS / CIP3 separation rasters decode gs-free when the embedded data is TIFF;
EPS / legacy .ai always need Ghostscript and return a clean 422 when it is
absent. Structural die / CAD files (CFF2, DDES, DXF) are parsed gs-free
into a CodexDocument carrying summary.dieline — an authoritative die
size (high confidence) plus one candidate per cutting sub-type
(cut / crease / score / perf / kiss_cut / fold / glue / bleed) with
source="structural". CFF2 / DDES sub-types come from the file's numeric
line-type code; DXF sub-types come from the entity's layer name (mapped via the
shared dieline vocabulary). The proprietary binary structural formats (ARD,
DWG) are detected and rejected with a remediation note (export CFF2/DDES/DXF).
- TypeScript client (
@printwithsynergy/codex-client) mirroring the Pythoncodex_pdf.clientsurface, with SSE streaming for probe and extract, pluscomputeCoverage/platesExtract. - Versioned schemas in
schemas/v1/(document, color, geom, embellishment, ink-coverage, plate-set). - Cloudflare Worker (
codex-edge) providing a KV-backed write-through cache layer in front of the API. - Redis-Streams speculator (
codex-speculator) that pre-warms Phase 1 + Phase 2 caches. - Opt-in retention to Cloudflare R2 for the marketing demo:
retain_for_training=trueonPOST /v1/extractpersists the PDF + extract + metadata under a hive-partitioned key; the default remains "delete bytes on response". SeeCLAUDE.mdfor the deployed bucket layout.
See CLAUDE.md for the full deployed-service map
(URLs, account IDs, version-bump checklist).
Quickstart
uv sync
uv run codex-pdf probe input.pdf --json
uv run codex-pdf extract input.pdf --pretty > out.json
uv run codex-pdf validate out.json
uv run codex-pdf parity --fixtures-root tests/fixtures --profile summary --max-files 5
Run the HTTP API locally:
uv run codex-pdf serve --host 0.0.0.0 --port 8080
curl localhost:8080/v1/version
Contract
The public surface is the JSON contract rooted at CodexDocument,
plus the per-section contracts under color and geom.
- Document schema:
schemas/v1/codex-document.schema.json - Runtime model:
codex_pdf.models.v1.CodexDocument - Stability policy: SemVer (
majorfor breaking contract changes; field additions are minor bumps). - Live contract endpoint:
GET /v1/contractreturns the endpoint inventory plussection_schema_versions.
Documentation
| Topic | Doc |
|---|---|
Standard data-request pattern (requestAsset) |
docs/data-requests.md |
| Architecture and boundaries | docs/architecture.md |
| CLI commands and usage | docs/cli.md |
| Contract and schema versioning | docs/contract.md |
| Determinism + transparency posture (model disclosure, version-pinned cache) | docs/determinism.md |
| Accuracy methodology (deterministic lanes) | docs/accuracy.md |
| Deploying the API + speculator + edge | docs/deploy.md |
| Parity profiles and baselines | docs/parity.md |
| Preflight ingest adapters | docs/preflight-ingest.md |
| Codex change ripple rule | docs/operations/codex-change-ripple.md |
| Marketing deploy template | docs/operations/marketing-deploy-template.md |
Contributing
We welcome PRs that fit codex's lane (extraction, normalization, detection signals). Display concerns belong in Lens; file comparison (two-file difference facts) belongs in collate; rule pass/fail logic belongs in Lint.
Read CONTRIBUTING.md for the dev setup, test
commands, schema-bump rules, and release checklist.
Security
Please report vulnerabilities privately to
security@thinkneverland.com — do not open a public issue.
The full disclosure policy, supported-version matrix, and scope
(including the read-only PDF invariant) live in
SECURITY.md.
License
codexPDF is distributed under the GNU Affero General Public
License v3.0 or later (SPDX-License-Identifier: AGPL-3.0-or-later). The full license text is in
LICENSE.
AGPL applies in particular when codex is reachable over a network — modifications served to remote users must be made available to those users under the same terms.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codex_pdf-1.45.0.tar.gz.
File metadata
- Download URL: codex_pdf-1.45.0.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4863c9397053599e5c49d95a5d596be1e4d51d25ebe09854ce2b7eacbef9c6d
|
|
| MD5 |
daceda43ac6efe47aa92bcde637bf591
|
|
| BLAKE2b-256 |
1c7d733cf81f159e0f38722521e7f142e14b3cf7336cdfb267aa0a805b15c2bc
|
File details
Details for the file codex_pdf-1.45.0-py3-none-any.whl.
File metadata
- Download URL: codex_pdf-1.45.0-py3-none-any.whl
- Upload date:
- Size: 919.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d76bd44146717dfbcd16d3664fd9bdb20ce7a22a75c183be4c2ff05e23e2cfb7
|
|
| MD5 |
9e620b177792dd7902aa5b2d65bb450a
|
|
| BLAKE2b-256 |
de10007ba3e2999fe0a4d44873b82f718c6394016bed4a11647fb8c1b7a9dfd2
|