Turn flat PDFs into real, fillable AcroForms - permissive, deterministic, zero-copyleft.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

san64777

These details have not been verified by PyPI

Project description

acroforge

Turn flat PDFs into real, fillable AcroForms. Permissive (Apache-2.0), deterministic, zero-copyleft.

Flat PDF turned into a fillable PDF by acroforge

Left: a flat PDF - just printed lines and an empty box. Right: the same PDF after acroforge - real, fillable form fields, filled and rendered correctly. No Adobe, no cloud, no AGPL.

What it does

acroforge takes any PDF - vector or scanned - and injects real AcroForm fields at positions you specify. The result is a standards-compliant fillable PDF that renders correctly in Chrome's pdfium and Firefox's pdf.js.

Four operations:

Operation	What it does
`build`	Inject interactive AcroForm fields into a flat PDF
`fill`	Set field values by name on a fillable PDF
`remove`	Delete specific fields by name (raises if a name is missing)
`flatten`	Bake field appearances into page content; remove interactive fields

All accept and return plain bytes, making them easy to compose in any pipeline.

Tested on real-world forms

The deterministic core (build / fill / flatten / read_fields) is validated against 125 real public PDF forms - IRS and other government forms (VA, OPM, GSA), CMS / Medicare and hospital healthcare forms, federal- and state-court legal forms, and SBA / USPTO / vendor business forms. It reads every one of them, and fills, flattens, and round-trips every fillable one (over 11,000 real fields) without a single crash. Every field type renders correctly in Chrome's pdfium and Firefox's pdf.js, golden-image tested in CI.

The detect() layer below is separate and clearly labeled best-effort.

Install

pip install acroforge

Or from source:

git clone https://github.com/san64777/acroforge
cd acroforge
pip install -e .   # or: uv pip install -e .

Python usage

import io
from reportlab.pdfgen import canvas  # any PDF source works
import acroforge as af
from acroforge import FieldSpec, FieldType

# --- Step 0: obtain a flat PDF (bytes) any way you like ---
buf = io.BytesIO()
c = canvas.Canvas(buf, pagesize=(612, 792))
c.drawString(72, 720, "Name:")
c.drawString(72, 680, "Agree to terms:")
c.save()
flat_pdf: bytes = buf.getvalue()

# --- Step 1: describe the fields you want ---
fields = [
    FieldSpec(
        type=FieldType.TEXT,
        page=0,
        rect=(200, 700, 450, 730),  # (x0, y0, x1, y1) in PDF points
        name="full_name",
    ),
    FieldSpec(
        type=FieldType.CHECKBOX,
        page=0,
        rect=(200, 660, 220, 680),
        name="agree",
        export_value="Yes",
    ),
]

# --- Step 2: inject the fields ---
fillable: bytes = af.build(flat_pdf, fields)

# --- Step 3: fill values ---
filled: bytes = af.fill(fillable, {"full_name": "Jane Doe", "agree": True})

# --- Step 4: flatten (optional - locks the form) ---
final: bytes = af.flatten(filled)

# Write to disk
with open("output.pdf", "wb") as f:
    f.write(final)

CLI usage

# 1. Inject fields described in a JSON manifest
acroforge build in.pdf manifest.json fillable.pdf

# 2. Fill fields from a JSON object {name: value}
acroforge fill fillable.pdf data.json filled.pdf

# 3. Flatten (bake and lock)
acroforge flatten filled.pdf final.pdf

Example manifest.json:

[
  {
    "type": "text",
    "page": 0,
    "rect": [200, 700, 450, 730],
    "name": "full_name"
  },
  {
    "type": "checkbox",
    "page": 0,
    "rect": [200, 660, 220, 680],
    "name": "agree",
    "export_value": "Yes"
  },
  {
    "type": "radio",
    "page": 0,
    "rect": [200, 620, 220, 640],
    "name": "plan",
    "options": ["basic", "pro", "enterprise"],
    "export_value": "pro"
  }
]

Example data.json:

{"full_name": "Jane Doe", "agree": true, "plan": "pro"}

Field types

Type	`FieldType`	Notes
Single-line text	`FieldType.TEXT`	Optional `maxlen` to cap character count
Multi-cell comb	`FieldType.COMB`	`maxlen` sets the number of cells (e.g. SSN = 9)
Checkbox	`FieldType.CHECKBOX`	`export_value` is the on-state value (default `"Yes"`)
Radio button	`FieldType.RADIO`	One `FieldSpec` per button; share `name`, set `export_value` per button
Signature	`FieldType.SIGNATURE`	Placeholder widget - renders a blank sig box
Dropdown / list box	`FieldType.CHOICE`	`options` lists the choices; `list_box`, `multi_select`, `editable` flags (see note)

`FieldSpec` reference

class FieldSpec(BaseModel):
    type: FieldType
    page: int                                    # 0-indexed
    rect: tuple[float, float, float, float]      # (x0, y0, x1, y1) in PDF points
    name: str                                    # AcroForm field name
    options: list[str] | list[tuple[str, str]] | None = None  # choice options (str or (export, label))
    maxlen: int | None = None                    # TEXT cap / COMB cell count
    export_value: str | None = None              # radio/checkbox on-value
    list_box: bool = False                       # CHOICE: False=dropdown, True=list box
    multi_select: bool = False                   # CHOICE list box: allow multiple selections
    editable: bool = False                       # CHOICE combo: accept free-typed text
    confidence: float = 1.0                      # 1.0 = explicit; <1.0 = best-effort guess

Dropdowns and list boxes (`FieldType.CHOICE`)

# dropdown (combo box)
FieldSpec(type=FieldType.CHOICE, page=0, rect=(200, 620, 360, 640),
          name="state", options=["CA", "NY", "TX"])

# (export, label) pairs: store "CA", display "California"
FieldSpec(type=FieldType.CHOICE, page=0, rect=(200, 580, 360, 600),
          name="st", options=[("CA", "California"), ("NY", "New York")])

# scrolling list box, multi-select
FieldSpec(type=FieldType.CHOICE, page=0, rect=(200, 500, 360, 570),
          name="langs", options=["en", "fr", "de"], list_box=True, multi_select=True)

All four variants - dropdown, single-select list box, editable dropdown, and multi-select list box - are cross-viewer verified: the selected value renders in both pdfium and pdf.js. read_fields recovers a choice field's structure (its options and the list_box / multi_select / editable flags); it does not recover the current selection, since a FieldSpec describes the field, not its filled value.

Detection (best-effort)

In addition to the deterministic engine, acroforge ships an optional, best-effort detector that guesses where fields belong on a flat vector PDF by reading its vector geometry and nearby text labels. It handles both common form archetypes:

Underline forms - write-on rules become text fields.
Table/grid forms - bordered table cells become text fields (label-aware: the field is placed in the writable area below the label, multi-column cells are split, and section-header rows are skipped).
Checkboxes - both vector squares and font glyphs (☐ / ☑ / ☒).

import acroforge as af

pdf = open("form.pdf", "rb").read()

# Inspect candidate fields (a FormManifest); every field has confidence < 1.0
manifest = af.detect(pdf)
for f in manifest.fields:
    print(f.type, f.name, f.rect, f.confidence)

# Or go straight to a fillable PDF (detect() then build())
fillable: bytes = af.make_fillable(pdf)

CLI:

# Print the detected manifest as JSON (review it!)
acroforge detect form.pdf

# Detect and write a fillable PDF in one step
acroforge make-fillable form.pdf fillable.pdf

Read this before relying on it:

Heuristic. Detection guesses from vector shapes and text proximity. It will miss fields and invent spurious ones.
Vector-only. It reads the PDF's vector content stream. Scanned (image-only) PDFs are refused with ScannedPDFError - there is no OCR.
Confidence-scored. Every detected FieldSpec carries confidence < 1.0 to flag it as a guess. Explicitly authored specs use confidence = 1.0.
Meant to be reviewed. Treat the output of detect() / make-fillable as a draft manifest to inspect and correct, not a finished form.
No accuracy claims. We make no promise about detection precision or recall on any form. Quality varies wildly by document.
No AI. There are no models, no inference, no network calls - just deterministic geometry heuristics over the PDF's own vectors.

Reading existing fields

read_fields(pdf) ingests the AcroForm fields already present in a fillable PDF as FieldSpecs (real registered fields, so confidence = 1.0). It is the inverse of build, so the two round-trip:

import acroforge as af

specs = af.read_fields(open("fillable.pdf", "rb").read())   # -> list[FieldSpec]
for s in specs:
    print(s.type.value, s.name, s.rect)

# copy one form's field layout onto another PDF
af.build(other_pdf, af.read_fields(template_pdf))

(One FieldSpec per widget, with coordinates, type, name, and checkbox/radio on-states recovered. Dropdowns are reported as text. Pushbuttons are skipped.)

Removing fields

remove(pdf, names) deletes specific fields by the name read_fields reports, so the two compose. Handy when make_fillable over-detects, or to strip a field before sending a form:

specs = af.read_fields(pdf)
junk = [s.name for s in specs if s.type == af.FieldType.SIGNATURE]
clean = af.remove(pdf, junk)        # raises ValueError if any name is missing

Naming a radio group removes the whole group; removing the last field leaves an empty, re-usable /AcroForm.

Serializing a manifest

detect() returns a FormManifest and read_fields() returns list[FieldSpec] - both pydantic models, so store / send-to-a-UI / round-trip them with pydantic's built-ins (no extra API to learn):

data = manifest.model_dump_json()                  # -> JSON string
manifest = FormManifest.model_validate_json(data)  # -> back to a FormManifest
af.build(pdf, manifest.fields)                      # build from the (edited) specs

(export, label) option pairs round-trip as [export, label] arrays and back to tuples; generate a TypeScript type from FormManifest.model_json_schema().

Scope and honest limits

The reliable part is the deterministic build / fill / flatten engine. You supply field positions via FieldSpecs - acroforge injects, fills, and flattens them reliably at exactly the coordinates you give it, on any PDF (vector or scanned).

detect() / make_fillable() are the best-effort layer described above: use them to bootstrap a manifest, then review and hand off the corrected specs to the engine.

XFA / dynamic forms: some PDFs (many government forms) carry a dynamic XFA layer over the standard AcroForm. acroforge operates on the AcroForm layer - which is what most viewers render - and drops the XFA layer on output. Flattened output is unambiguous everywhere; for interactive output, an XFA-first viewer (some Adobe configurations) may prefer the dropped layer, so flatten the result if you need cross-Adobe fidelity.

There is no AI in this package, and no copyrighted form templates are bundled - bring your own PDFs.

Engine and dependencies

Runtime dependencies are strictly permissive:

Package	License	Role
`reportlab`	BSD	Field widget rendering
`pypdf`	BSD-3-Clause	PDF read / merge / flatten
`pdfplumber`	MIT	PDF geometry utilities
`PyPDFForm`	MIT	Fill helpers
`pydantic`	MIT	`FieldSpec` / `FormManifest` validation

Optional extras:

[fallback] - adds pikepdf (MPL-2.0) as a fallback PDF writer; not required for the default engine path.
[harness] - adds pypdfium2 + Pillow for cross-viewer visual regression tests.

No GPL, AGPL, LGPL, or SSPL in the runtime tree. CI enforces this on every push via pip-licenses --fail-on='GPL;AGPL;LGPL;SSPL'.

License

Apache-2.0. See LICENSE.

No copyrighted form templates are included or bundled. Bring your own PDFs.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

san64777

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.0

Jun 8, 2026

0.3.4

Jun 8, 2026

0.3.3

Jun 8, 2026

0.3.2

Jun 8, 2026

0.3.0

Jun 8, 2026

0.2.0

Jun 7, 2026

0.1.1

Jun 7, 2026

0.1.0

Jun 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

acroforge-0.4.0.tar.gz (782.4 kB view details)

Uploaded Jun 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

acroforge-0.4.0-py3-none-any.whl (36.5 kB view details)

Uploaded Jun 8, 2026 Python 3

File details

Details for the file acroforge-0.4.0.tar.gz.

File metadata

Download URL: acroforge-0.4.0.tar.gz
Upload date: Jun 8, 2026
Size: 782.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for acroforge-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`aa87313de4c3e7ff33a62fb3fb72a4b19c7d61e5f03bf850dc6bd363cf14ed49`
MD5	`e62ab8467e76b8e4d9e2e05f0b63c3be`
BLAKE2b-256	`09e5b754d125f6b4c0b70e1434fbb225981a43b4db79a9bdc0472c5490aeeb58`

See more details on using hashes here.

Provenance

The following attestation bundles were made for acroforge-0.4.0.tar.gz:

Publisher: publish.yml on san64777/acroforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: acroforge-0.4.0.tar.gz
- Subject digest: aa87313de4c3e7ff33a62fb3fb72a4b19c7d61e5f03bf850dc6bd363cf14ed49
- Sigstore transparency entry: 1759425091
- Sigstore integration time: Jun 8, 2026
Source repository:
- Permalink: san64777/acroforge@cce1dcff57b170ccd26853aed5dde2a3c41a99a1
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/san64777
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@cce1dcff57b170ccd26853aed5dde2a3c41a99a1
- Trigger Event: release

File details

Details for the file acroforge-0.4.0-py3-none-any.whl.

File metadata

Download URL: acroforge-0.4.0-py3-none-any.whl
Upload date: Jun 8, 2026
Size: 36.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for acroforge-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d883ee8ac891d05f6f471f73c6c9b24a4a8ca4b21049a86d6dcfca00924b7c90`
MD5	`d2037c38178b627aac71db99cb33924a`
BLAKE2b-256	`47924082bb5149cc40393a0dd764c08fde7f77d29c3b0e058db62b0251b6937a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for acroforge-0.4.0-py3-none-any.whl:

Publisher: publish.yml on san64777/acroforge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: acroforge-0.4.0-py3-none-any.whl
- Subject digest: d883ee8ac891d05f6f471f73c6c9b24a4a8ca4b21049a86d6dcfca00924b7c90
- Sigstore transparency entry: 1759425196
- Sigstore integration time: Jun 8, 2026
Source repository:
- Permalink: san64777/acroforge@cce1dcff57b170ccd26853aed5dde2a3c41a99a1
- Branch / Tag: refs/tags/v0.4.0
- Owner: https://github.com/san64777
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@cce1dcff57b170ccd26853aed5dde2a3c41a99a1
- Trigger Event: release

acroforge 0.4.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

acroforge

What it does

Tested on real-world forms

Install

Python usage

CLI usage

Field types

FieldSpec reference

Dropdowns and list boxes (FieldType.CHOICE)

Detection (best-effort)

Reading existing fields

Removing fields

Serializing a manifest

Scope and honest limits

Engine and dependencies

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`FieldSpec` reference

Dropdowns and list boxes (`FieldType.CHOICE`)