Skip to main content

Authoritative, versioned PDF facts contract for Think Neverland tools. v1.21.x adds codex_pdf.errors (RFC 7807 Problem Details) as the cross-stack HTTP error envelope.

Project description


title: "Overview" description: "Authoritative read-only PDF facts + render engine for the Print with Synergy tool family. Versioned contract, schema-validated output, deployed as three services." group: "Getting started" order: 1 slug: "overview"

codexPDF

Deploy on Railway

codexPDF is the authoritative, read-only PDF facts + render reference for the Print with Synergy tool family.

Other engines consult codexPDF for canonical document facts instead of re-parsing PDFs independently. The contract is versioned and schema-validated.

Status

codex-pdf 1.39.0. Current surface includes:

  • Python package (codex_pdf) with typed pydantic models.
  • CLI (codex-pdf extract|schema|contract|validate|probe|parity|render|serve).
  • HTTP API (/v1/extract, /v1/probe, /v1/extract/stream, /v1/render/{page,separations,heatmap,layer}, /v1/coverage, /v1/plates/extract, /v1/sample/{color,density}, /v1/walk/{type4,content-stream}, /v1/color/{resolve,match-pantone,neutral-density,inkbook}, /v1/geom/{tile,intersect,union,difference,offset}, /v1/retention/delete).
  • Unified input (1.36.0). POST /v1/extract + POST /v1/probe accept PDFs, Adobe Illustrator .ai files (sliced to their embedded PDF; legacy .ai converts via Ghostscript when available), EPS / composite PostScript (.eps/.ps — always normalized to PDF via Ghostscript; clean 422 when gs is absent), and raster plate / tooling files (1-bit TIFF + Esko LEN, single or repeatable set; multi-page TIFF where each page is a separation; TIFF/IT ISO 12639; DCS / DCS 2.0 + copydot; CIP3 PPF ink-coverage). Every input returns a normal CodexDocumentsummary.source_format records pdf / ai / eps / plate_tiff / plate_len / plate_tiff_it / plate_dcs / plate_cip3 / the structural die families cff2 / ddes / dxf (with a human-friendly summary.source_format_label, e.g. "1-bit TIFF plate"), and plate inputs additionally carry summary.{ink_coverage,embellishments,plate_set}. On the default full PDF extract (Ghostscript present) summary.ink_coverage also carries compact base64 PNG previews — a tac_heatmap_png plus a per-separation preview_png (downsampled, size-capped; disable with CODEX_EXTRACT_INK_COVERAGE=false). Plate coverage / screen ruling / min-dot are computed directly from the rasters (gs-free, deterministic). Structural die / CAD files (CFF2, DDES, DXF) are parsed gs-free into summary.dieline (authoritative die size + cut/crease/score/perf candidates). Encrypted .lenx, proprietary Scitex CT/LW (.ct/.lw CEPS), and proprietary binary structural CAD (.ard ArtiosCAD / .dwg AutoCAD) are detected and rejected cleanly with a precise remediation note.

Supported input formats

Every accepted input flows through POST /v1/extract + POST /v1/probe and returns a normal CodexDocument; summary.source_format records the family. Tier-1 = fully decoded; Tier-2 = detected and rejected with a remediation note (never a silent zero-finding result).

Format Ext Tier source_format Notes
PDF .pdf 1 pdf the canonical path
Adobe Illustrator .ai 1 ai embedded-PDF slice; legacy → Ghostscript
EPS / composite PostScript .eps/.ps 1 eps always normalized to PDF via Ghostscript
1-bit TIFF / Esko LEN .tif/.tiff/.len 1 plate_tiff / plate_len gs-free raster facts
Multi-page 1-bit TIFF .tif 1 plate_tiff one separation per page
TIFF/IT (ISO 12639) .ct/.lw/.fp/.tif 1 plate_tiff_it contone CT split into CMYK channels
DCS / DCS 2.0 + copydot .dcs/.eps 1 plate_dcs embedded/sidecar separations
CIP3 PPF .ppf 1 plate_cip3 per-separation ink coverage + sheet geometry
CFF2 / CF2 (structural die) .cf2/.cff2 1 cff2 ASCII vector die — cut/crease/score/perf geometry + die size
DDES2 / DDES3 (structural die) .dd3/.ds2/.ddes 1 ddes ASCII die-cutting exchange — line-type → subtype + die size
DXF (structural / CAD) .dxf 1 dxf ASCII ENTITIES (LINE/LWPOLYLINE/POLYLINE/ARC) grouped by layer → subtype
Encrypted Esko LEN .lenx 2 closed CDI-Crystal — clean 422
Scitex CT / LW (CEPS) .ct/.lw 2 proprietary — clean 415, convert to TIFF/IT
ArtiosCAD ARD .ard 2 proprietary Esko binary — clean 415, export CFF2/DDES/DXF
AutoCAD DWG .dwg 2 proprietary binary CAD — clean 415, export ASCII DXF

DCS / CIP3 separation rasters decode gs-free when the embedded data is TIFF; EPS / legacy .ai always need Ghostscript and return a clean 422 when it is absent. Structural die / CAD files (CFF2, DDES, DXF) are parsed gs-free into a CodexDocument carrying summary.dieline — an authoritative die size (high confidence) plus one candidate per cutting sub-type (cut / crease / score / perf / kiss_cut / fold / glue / bleed) with source="structural". CFF2 / DDES sub-types come from the file's numeric line-type code; DXF sub-types come from the entity's layer name (mapped via the shared dieline vocabulary). The proprietary binary structural formats (ARD, DWG) are detected and rejected with a remediation note (export CFF2/DDES/DXF).

  • TypeScript client (@printwithsynergy/codex-client) mirroring the Python codex_pdf.client surface, with SSE streaming for probe and extract, plus computeCoverage / platesExtract.
  • Versioned schemas in schemas/v1/ (document, color, geom, embellishment, ink-coverage, plate-set).
  • Cloudflare Worker (codex-edge) providing a KV-backed write-through cache layer in front of the API.
  • Redis-Streams speculator (codex-speculator) that pre-warms Phase 1 + Phase 2 caches.
  • Opt-in retention to Cloudflare R2 for the marketing demo: retain_for_training=true on POST /v1/extract persists the PDF + extract + metadata under a hive-partitioned key; the default remains "delete bytes on response". See CLAUDE.md for the deployed bucket layout.

See CLAUDE.md for the full deployed-service map (URLs, account IDs, version-bump checklist).

Quickstart

uv sync
uv run codex-pdf probe input.pdf --json
uv run codex-pdf extract input.pdf --pretty > out.json
uv run codex-pdf validate out.json
uv run codex-pdf parity --fixtures-root tests/fixtures --profile summary --max-files 5

Run the HTTP API locally:

uv run codex-pdf serve --host 0.0.0.0 --port 8080
curl localhost:8080/v1/version

Contract

The public surface is the JSON contract rooted at CodexDocument, plus the per-section contracts under color and geom.

  • Document schema: schemas/v1/codex-document.schema.json
  • Runtime model: codex_pdf.models.v1.CodexDocument
  • Stability policy: SemVer (major for breaking contract changes; field additions are minor bumps).
  • Live contract endpoint: GET /v1/contract returns the endpoint inventory plus section_schema_versions.

Documentation

Topic Doc
Standard data-request pattern (requestAsset) docs/data-requests.md
Architecture and boundaries docs/architecture.md
CLI commands and usage docs/cli.md
Contract and schema versioning docs/contract.md
Determinism + transparency posture (model disclosure, version-pinned cache) docs/determinism.md
Accuracy methodology (deterministic lanes) docs/accuracy.md
Deploying the API + speculator + edge docs/deploy.md
Parity profiles and baselines docs/parity.md
Preflight ingest adapters docs/preflight-ingest.md
Codex change ripple rule docs/operations/codex-change-ripple.md
Marketing deploy template docs/operations/marketing-deploy-template.md

Contributing

We welcome PRs that fit codex's lane (extraction, normalization, detection signals). Display concerns belong in Lens; file comparison (two-file difference facts) belongs in collate; rule pass/fail logic belongs in Lint.

Read CONTRIBUTING.md for the dev setup, test commands, schema-bump rules, and release checklist.

Security

Please report vulnerabilities privately to security@thinkneverland.com — do not open a public issue.

The full disclosure policy, supported-version matrix, and scope (including the read-only PDF invariant) live in SECURITY.md.

License

codexPDF is distributed under the GNU Affero General Public License v3.0 or later (SPDX-License-Identifier: AGPL-3.0-or-later). The full license text is in LICENSE.

AGPL applies in particular when codex is reachable over a network — modifications served to remote users must be made available to those users under the same terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codex_pdf-1.45.0.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codex_pdf-1.45.0-py3-none-any.whl (919.8 kB view details)

Uploaded Python 3

File details

Details for the file codex_pdf-1.45.0.tar.gz.

File metadata

  • Download URL: codex_pdf-1.45.0.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for codex_pdf-1.45.0.tar.gz
Algorithm Hash digest
SHA256 a4863c9397053599e5c49d95a5d596be1e4d51d25ebe09854ce2b7eacbef9c6d
MD5 daceda43ac6efe47aa92bcde637bf591
BLAKE2b-256 1c7d733cf81f159e0f38722521e7f142e14b3cf7336cdfb267aa0a805b15c2bc

See more details on using hashes here.

File details

Details for the file codex_pdf-1.45.0-py3-none-any.whl.

File metadata

  • Download URL: codex_pdf-1.45.0-py3-none-any.whl
  • Upload date:
  • Size: 919.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for codex_pdf-1.45.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d76bd44146717dfbcd16d3664fd9bdb20ce7a22a75c183be4c2ff05e23e2cfb7
MD5 9e620b177792dd7902aa5b2d65bb450a
BLAKE2b-256 de10007ba3e2999fe0a4d44873b82f718c6394016bed4a11647fb8c1b7a9dfd2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page