Skip to main content

Authoritative, versioned PDF facts contract for Think Neverland tools. v1.21.x adds codex_pdf.errors (RFC 7807 Problem Details) as the cross-stack HTTP error envelope.

Project description


title: "Overview" description: "Authoritative read-only PDF facts + render engine for the Print with Synergy tool family. Versioned contract, schema-validated output, deployed as three services." group: "Getting started" order: 1 slug: "overview"

codexPDF

Deploy on Railway

codexPDF is the authoritative, read-only PDF facts + render reference for the Print with Synergy tool family.

Other engines consult codexPDF for canonical document facts instead of re-parsing PDFs independently. The contract is versioned and schema-validated.

Status

codex-pdf 1.23.0. Current surface includes:

  • Python package (codex_pdf) with typed pydantic models.
  • CLI (codex-pdf extract|schema|contract|validate|probe|parity|render|serve).
  • HTTP API (/v1/extract, /v1/probe, /v1/extract/stream, /v1/render/{page,separations,heatmap,layer}, /v1/sample/{color,density}, /v1/walk/{type4,content-stream}, /v1/color/{resolve,match-pantone,neutral-density,inkbook}, /v1/geom/{tile,intersect,union,difference,offset}, /v1/retention/delete).
  • TypeScript client (@printwithsynergy/codex-client) mirroring the Python codex_pdf.client surface, with SSE streaming for probe and extract.
  • Versioned schemas in schemas/v1/ (document, color, geom).
  • Cloudflare Worker (codex-edge) providing a KV-backed write-through cache layer in front of the API.
  • Redis-Streams speculator (codex-speculator) that pre-warms Phase 1 + Phase 2 caches.
  • Opt-in retention to Cloudflare R2 for the marketing demo: retain_for_training=true on POST /v1/extract persists the PDF + extract + metadata under a hive-partitioned key; the default remains "delete bytes on response". See CLAUDE.md for the deployed bucket layout.

See CLAUDE.md for the full deployed-service map (URLs, account IDs, version-bump checklist).

Quickstart

uv sync
uv run codex-pdf probe input.pdf --json
uv run codex-pdf extract input.pdf --pretty > out.json
uv run codex-pdf validate out.json
uv run codex-pdf parity --fixtures-root tests/fixtures --profile summary --max-files 5

Run the HTTP API locally:

uv run codex-pdf serve --host 0.0.0.0 --port 8080
curl localhost:8080/v1/version

Contract

The public surface is the JSON contract rooted at CodexDocument, plus the per-section contracts under color and geom.

  • Document schema: schemas/v1/codex-document.schema.json
  • Runtime model: codex_pdf.models.v1.CodexDocument
  • Stability policy: SemVer (major for breaking contract changes; field additions are minor bumps).
  • Live contract endpoint: GET /v1/contract returns the endpoint inventory plus section_schema_versions.

Documentation

Topic Doc
Standard data-request pattern (requestAsset) docs/data-requests.md
Architecture and boundaries docs/architecture.md
CLI commands and usage docs/cli.md
Contract and schema versioning docs/contract.md
Determinism + transparency posture (model disclosure, version-pinned cache) docs/determinism.md
Accuracy methodology (deterministic lanes) docs/accuracy.md
Deploying the API + speculator + edge docs/deploy.md
Parity profiles and baselines docs/parity.md
Preflight ingest adapters docs/preflight-ingest.md
Codex change ripple rule docs/operations/codex-change-ripple.md
Marketing deploy template docs/operations/marketing-deploy-template.md

Contributing

We welcome PRs that fit codex's lane (extraction, normalization, detection signals). Display concerns belong in Lens; rule pass/fail logic belongs in Lint.

Read CONTRIBUTING.md for the dev setup, test commands, schema-bump rules, and release checklist.

Security

Please report vulnerabilities privately to security@thinkneverland.com — do not open a public issue.

The full disclosure policy, supported-version matrix, and scope (including the read-only PDF invariant) live in SECURITY.md.

License

codexPDF is distributed under the GNU Affero General Public License v3.0 or later (SPDX-License-Identifier: AGPL-3.0-or-later). The full license text is in LICENSE.

AGPL applies in particular when codex is reachable over a network — modifications served to remote users must be made available to those users under the same terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codex_pdf-1.35.0.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codex_pdf-1.35.0-py3-none-any.whl (775.0 kB view details)

Uploaded Python 3

File details

Details for the file codex_pdf-1.35.0.tar.gz.

File metadata

  • Download URL: codex_pdf-1.35.0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for codex_pdf-1.35.0.tar.gz
Algorithm Hash digest
SHA256 9427007f4a4015e43d39ff948a9ea8aba41194139f41d4b8997f28c14456156d
MD5 dab81dff41edd9717a0bdf10bd007e37
BLAKE2b-256 c7a7523537e8fe6f782041b6cbcea919170d04707901a3a7c42f2fb0a9828cd5

See more details on using hashes here.

File details

Details for the file codex_pdf-1.35.0-py3-none-any.whl.

File metadata

  • Download URL: codex_pdf-1.35.0-py3-none-any.whl
  • Upload date:
  • Size: 775.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for codex_pdf-1.35.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2bd762c07302a4573bcc867eb47af4621dbc05a40c519b2814259e17f1e4818e
MD5 4c06f1e1b6106f90ae8b843e5d7bf2ad
BLAKE2b-256 2f0c47fac69b5115f61f3498c4dffef87839159acecb4db43d11e8265164e87b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page