Skip to main content

Turn flat PDFs into real, fillable AcroForms - permissive, deterministic, zero-copyleft.

Project description

acroforge

Turn flat PDFs into real, fillable AcroForms. Permissive (Apache-2.0), deterministic, zero-copyleft.

CI License: Apache 2.0 Python 3.11+ Checked with mypy Ruff

Flat PDF turned into a fillable PDF by acroforge

Left: a flat PDF - just printed lines and an empty box. Right: the same PDF after acroforge - real, fillable form fields, filled and rendered correctly. No Adobe, no cloud, no AGPL.


What it does

acroforge takes any PDF - vector or scanned - and injects real AcroForm fields at positions you specify. The result is a standards-compliant fillable PDF that renders correctly in Chrome's pdfium and Firefox's pdf.js.

Three operations:

Operation What it does
build Inject interactive AcroForm fields into a flat PDF
fill Set field values by name on a fillable PDF
flatten Bake field appearances into page content; remove interactive fields

All three functions accept and return plain bytes, making them easy to compose in any pipeline.


Install

pip install acroforge

Or from source:

git clone https://github.com/san64777/acroforge
cd acroforge
pip install -e .   # or: uv pip install -e .

Python usage

import io
from reportlab.pdfgen import canvas  # any PDF source works
import acroforge as af
from acroforge import FieldSpec, FieldType

# --- Step 0: obtain a flat PDF (bytes) any way you like ---
buf = io.BytesIO()
c = canvas.Canvas(buf, pagesize=(612, 792))
c.drawString(72, 720, "Name:")
c.drawString(72, 680, "Agree to terms:")
c.save()
flat_pdf: bytes = buf.getvalue()

# --- Step 1: describe the fields you want ---
fields = [
    FieldSpec(
        type=FieldType.TEXT,
        page=0,
        rect=(200, 700, 450, 730),  # (x0, y0, x1, y1) in PDF points
        name="full_name",
    ),
    FieldSpec(
        type=FieldType.CHECKBOX,
        page=0,
        rect=(200, 660, 220, 680),
        name="agree",
        export_value="Yes",
    ),
]

# --- Step 2: inject the fields ---
fillable: bytes = af.build(flat_pdf, fields)

# --- Step 3: fill values ---
filled: bytes = af.fill(fillable, {"full_name": "Jane Doe", "agree": True})

# --- Step 4: flatten (optional - locks the form) ---
final: bytes = af.flatten(filled)

# Write to disk
with open("output.pdf", "wb") as f:
    f.write(final)

CLI usage

# 1. Inject fields described in a JSON manifest
acroforge build in.pdf manifest.json fillable.pdf

# 2. Fill fields from a JSON object {name: value}
acroforge fill fillable.pdf data.json filled.pdf

# 3. Flatten (bake and lock)
acroforge flatten filled.pdf final.pdf

Example manifest.json:

[
  {
    "type": "text",
    "page": 0,
    "rect": [200, 700, 450, 730],
    "name": "full_name"
  },
  {
    "type": "checkbox",
    "page": 0,
    "rect": [200, 660, 220, 680],
    "name": "agree",
    "export_value": "Yes"
  },
  {
    "type": "radio",
    "page": 0,
    "rect": [200, 620, 220, 640],
    "name": "plan",
    "options": ["basic", "pro", "enterprise"],
    "export_value": "pro"
  }
]

Example data.json:

{"full_name": "Jane Doe", "agree": true, "plan": "pro"}

Field types

Type FieldType Notes
Single-line text FieldType.TEXT Optional maxlen to cap character count
Multi-cell comb FieldType.COMB maxlen sets the number of cells (e.g. SSN = 9)
Checkbox FieldType.CHECKBOX export_value is the on-state value (default "Yes")
Radio button FieldType.RADIO One FieldSpec per button; share name, set export_value per button
Signature FieldType.SIGNATURE Placeholder widget - renders a blank sig box

FieldSpec reference

class FieldSpec(BaseModel):
    type: FieldType
    page: int                                    # 0-indexed
    rect: tuple[float, float, float, float]      # (x0, y0, x1, y1) in PDF points
    name: str                                    # AcroForm field name
    options: list[str] | None = None             # radio group member labels
    maxlen: int | None = None                    # TEXT cap / COMB cell count
    export_value: str | None = None              # radio/checkbox on-value
    confidence: float = 1.0                      # 1.0 = explicit; <1.0 = best-effort guess

Detection (best-effort)

In addition to the deterministic engine, acroforge ships an optional, best-effort detector that guesses where fields belong on a flat vector PDF by reading its vector geometry and nearby text labels. It handles both common form archetypes:

  • Underline forms - write-on rules become text fields.
  • Table/grid forms - bordered table cells become text fields (label-aware: the field is placed in the writable area below the label, multi-column cells are split, and section-header rows are skipped).
  • Checkboxes - both vector squares and font glyphs (☐ / ☑ / ☒).
import acroforge as af

pdf = open("form.pdf", "rb").read()

# Inspect candidate fields (a FormManifest); every field has confidence < 1.0
manifest = af.detect(pdf)
for f in manifest.fields:
    print(f.type, f.name, f.rect, f.confidence)

# Or go straight to a fillable PDF (detect() then build())
fillable: bytes = af.make_fillable(pdf)

CLI:

# Print the detected manifest as JSON (review it!)
acroforge detect form.pdf

# Detect and write a fillable PDF in one step
acroforge make-fillable form.pdf fillable.pdf

Read this before relying on it:

  • Heuristic. Detection guesses from vector shapes and text proximity. It will miss fields and invent spurious ones.
  • Vector-only. It reads the PDF's vector content stream. Scanned (image-only) PDFs are refused with ScannedPDFError - there is no OCR.
  • Confidence-scored. Every detected FieldSpec carries confidence < 1.0 to flag it as a guess. Explicitly authored specs use confidence = 1.0.
  • Meant to be reviewed. Treat the output of detect() / make-fillable as a draft manifest to inspect and correct, not a finished form.
  • No accuracy claims. We make no promise about detection precision or recall on any form. Quality varies wildly by document.
  • No AI. There are no models, no inference, no network calls - just deterministic geometry heuristics over the PDF's own vectors.

Scope and honest limits

The reliable part is the deterministic build / fill / flatten engine. You supply field positions via FieldSpecs - acroforge injects, fills, and flattens them reliably at exactly the coordinates you give it, on any PDF (vector or scanned).

detect() / make_fillable() are the best-effort layer described above: use them to bootstrap a manifest, then review and hand off the corrected specs to the engine.

There is no AI in this package, and no copyrighted form templates are bundled - bring your own PDFs.


Engine and dependencies

Runtime dependencies are strictly permissive:

Package License Role
reportlab BSD Field widget rendering
pypdf BSD-3-Clause PDF read / merge / flatten
pdfplumber MIT PDF geometry utilities
PyPDFForm MIT Fill helpers
pydantic MIT FieldSpec / FormManifest validation

Optional extras:

  • [fallback] - adds pikepdf (MPL-2.0) as a fallback PDF writer; not required for the default engine path.
  • [harness] - adds pypdfium2 + Pillow for cross-viewer visual regression tests.

No GPL, AGPL, LGPL, or SSPL in the runtime tree. CI enforces this on every push via pip-licenses --fail-on='GPL;AGPL;LGPL;SSPL'.


License

Apache-2.0. See LICENSE.

No copyrighted form templates are included or bundled. Bring your own PDFs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acroforge-0.1.1.tar.gz (764.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

acroforge-0.1.1-py3-none-any.whl (26.6 kB view details)

Uploaded Python 3

File details

Details for the file acroforge-0.1.1.tar.gz.

File metadata

  • Download URL: acroforge-0.1.1.tar.gz
  • Upload date:
  • Size: 764.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for acroforge-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0aa865fc9a3616407afa46614a0d2afd69082f7519f7ce5c14f38f313c64f185
MD5 1157d351c069e453e2cf56e5383580ba
BLAKE2b-256 5f759c45584a48eb620a5caa281c488b41a960c84fb35f066d38e18c9c5ef847

See more details on using hashes here.

Provenance

The following attestation bundles were made for acroforge-0.1.1.tar.gz:

Publisher: publish.yml on san64777/acroforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file acroforge-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: acroforge-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 26.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for acroforge-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 483bd802560d71c97f6c6c9fa108c86c8c67e107fd033926f28f1d448c948be9
MD5 8f1d7516c106c610e58b2dee2034a8ee
BLAKE2b-256 6f54c03a0f3e7c3e9e8acd4854a082683db7a207ef137088386bf70756a0c106

See more details on using hashes here.

Provenance

The following attestation bundles were made for acroforge-0.1.1-py3-none-any.whl:

Publisher: publish.yml on san64777/acroforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page