Skip to main content

PDF AcroForm field parser for Swarmauri using PyPDFTK and the native pdftk toolchain.

Project description

Swarmauri Logo

PyPI - Downloads Hits PyPI - Python Version PyPI - License PyPI - swarmauri_parser_pypdftk Discord

Swarmauri Parser PyPDFTK

swarmauri_parser_pypdftk is the Swarmauri PDF form-field parser for extracting AcroForm data through PyPDFTK and the native pdftk toolchain. It converts structured PDF form fields into a single Swarmauri Document so downstream workflows can index, validate, or route filled forms.

Why Use Swarmauri Parser PyPDFTK

  • Extract structured PDF field data instead of only free-form text.
  • Normalize AcroForm output into a Swarmauri Document for ingestion, automation, and analysis pipelines.
  • Keep form parsing aligned with the same Swarmauri parser interface used by other document components.
  • Pair form-field extraction with other PDF parsers when both structured fields and page text matter.

FAQ

What does this parser extract?
PDF form fields returned by pypdftk.dump_data_fields, such as AcroForm names and values.

Does it parse ordinary PDF text?
No. This package is for structured PDF form fields. Use another parser for general page text.

Does it need a system binary?
Yes. It depends on the pdftk or pdftk-java executable being installed and available on PATH.

What happens when the PDF has no form fields?
The parser returns an empty list.

Features

  • Extracts PDF AcroForm fields through PyPDFTK.
  • Returns one Swarmauri Document with newline-delimited key: value content.
  • Preserves the input source path in metadata.
  • Useful for form ingestion, validation, compliance workflows, and automation.
  • Supports Python 3.10, 3.11, 3.12, 3.13, and 3.14.

Installation

uv add swarmauri_parser_pypdftk
pip install swarmauri_parser_pypdftk

System requirement:

  • Install pdftk or pdftk-java and make sure the executable is available on PATH.

Usage

from swarmauri_parser_pypdftk import PyPDFTKParser

parser = PyPDFTKParser()
documents = parser.parse("forms/enrollment.pdf")

for document in documents:
    print(document.metadata["source"])
    print(document.content)

Examples

Extract form fields from a filled PDF

from swarmauri_parser_pypdftk import PyPDFTKParser

parser = PyPDFTKParser()
docs = parser.parse("forms/application.pdf")

if docs:
    print(docs[0].content)

Example output:

GivenName: John
FamilyName: Doe
BirthDate: 1990-01-01

Detect forms without field data

from swarmauri_parser_pypdftk import PyPDFTKParser

parser = PyPDFTKParser()
docs = parser.parse("forms/plain.pdf")

if not docs:
    print("No PDF form fields were detected.")

Related Packages

Swarmauri Foundations

More Documentation

Best Practices

  • Use this parser for PDFs with real AcroForm fields, not for generic PDF page text.
  • Validate that the pdftk binary is installed in deployment targets before running pipelines that depend on this package.
  • Pair this package with swarmauri_parser_pypdf2 or swarmauri_parser_fitzpdf if you also need free-form page text.
  • Route scan-only documents through OCR if they are image-based and contain no useful form structure.

License

This project is licensed under the Apache-2.0 License.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swarmauri_parser_pypdftk-0.11.0.dev1.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file swarmauri_parser_pypdftk-0.11.0.dev1.tar.gz.

File metadata

  • Download URL: swarmauri_parser_pypdftk-0.11.0.dev1.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_parser_pypdftk-0.11.0.dev1.tar.gz
Algorithm Hash digest
SHA256 4b66699514e398ba6613868d77125f20ea618e918a6103d6a98562f9d3dbe6a4
MD5 4b5bab06364b18d88c78aaae2ee15c7b
BLAKE2b-256 62c9afce8099c2dc2bdf84b696d3acdddb3e86442bcea3621606a9cb51851f81

See more details on using hashes here.

File details

Details for the file swarmauri_parser_pypdftk-0.11.0.dev1-py3-none-any.whl.

File metadata

  • Download URL: swarmauri_parser_pypdftk-0.11.0.dev1-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_parser_pypdftk-0.11.0.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 218905d25b8ffc368d190c0bc5d330e2b5c35d74b3e96c7102f9121c5c377902
MD5 f27ee33ad45614e525fb73684f194786
BLAKE2b-256 0a9cb091f3714b1649793695e9826c89ac6d7138ff7eabf1cc8bdce52bfb83de

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page