PDF AcroForm field parser for Swarmauri using PyPDFTK and the native pdftk toolchain.
Project description
Swarmauri Parser PyPDFTK
swarmauri_parser_pypdftk is the Swarmauri PDF form-field parser for
extracting AcroForm data through PyPDFTK
and the native pdftk toolchain. It converts structured PDF form fields into a
single Swarmauri Document so downstream workflows can index, validate, or
route filled forms.
Why Use Swarmauri Parser PyPDFTK
- Extract structured PDF field data instead of only free-form text.
- Normalize AcroForm output into a Swarmauri
Documentfor ingestion, automation, and analysis pipelines. - Keep form parsing aligned with the same Swarmauri parser interface used by other document components.
- Pair form-field extraction with other PDF parsers when both structured fields and page text matter.
FAQ
What does this parser extract?
PDF form fields returned bypypdftk.dump_data_fields, such as AcroForm names and values.
Does it parse ordinary PDF text?
No. This package is for structured PDF form fields. Use another parser for general page text.
Does it need a system binary?
Yes. It depends on thepdftkorpdftk-javaexecutable being installed and available onPATH.
What happens when the PDF has no form fields?
The parser returns an empty list.
Features
- Extracts PDF AcroForm fields through PyPDFTK.
- Returns one Swarmauri
Documentwith newline-delimitedkey: valuecontent. - Preserves the input source path in metadata.
- Useful for form ingestion, validation, compliance workflows, and automation.
- Supports Python 3.10, 3.11, 3.12, 3.13, and 3.14.
Installation
uv add swarmauri_parser_pypdftk
pip install swarmauri_parser_pypdftk
System requirement:
- Install
pdftkorpdftk-javaand make sure the executable is available onPATH.
Usage
from swarmauri_parser_pypdftk import PyPDFTKParser
parser = PyPDFTKParser()
documents = parser.parse("forms/enrollment.pdf")
for document in documents:
print(document.metadata["source"])
print(document.content)
Examples
Extract form fields from a filled PDF
from swarmauri_parser_pypdftk import PyPDFTKParser
parser = PyPDFTKParser()
docs = parser.parse("forms/application.pdf")
if docs:
print(docs[0].content)
Example output:
GivenName: John
FamilyName: Doe
BirthDate: 1990-01-01
Detect forms without field data
from swarmauri_parser_pypdftk import PyPDFTKParser
parser = PyPDFTKParser()
docs = parser.parse("forms/plain.pdf")
if not docs:
print("No PDF form fields were detected.")
Related Packages
Swarmauri Foundations
More Documentation
Best Practices
- Use this parser for PDFs with real AcroForm fields, not for generic PDF page text.
- Validate that the
pdftkbinary is installed in deployment targets before running pipelines that depend on this package. - Pair this package with
swarmauri_parser_pypdf2orswarmauri_parser_fitzpdfif you also need free-form page text. - Route scan-only documents through OCR if they are image-based and contain no useful form structure.
License
This project is licensed under the Apache-2.0 License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file swarmauri_parser_pypdftk-0.11.0.dev1.tar.gz.
File metadata
- Download URL: swarmauri_parser_pypdftk-0.11.0.dev1.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b66699514e398ba6613868d77125f20ea618e918a6103d6a98562f9d3dbe6a4
|
|
| MD5 |
4b5bab06364b18d88c78aaae2ee15c7b
|
|
| BLAKE2b-256 |
62c9afce8099c2dc2bdf84b696d3acdddb3e86442bcea3621606a9cb51851f81
|
File details
Details for the file swarmauri_parser_pypdftk-0.11.0.dev1-py3-none-any.whl.
File metadata
- Download URL: swarmauri_parser_pypdftk-0.11.0.dev1-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
218905d25b8ffc368d190c0bc5d330e2b5c35d74b3e96c7102f9121c5c377902
|
|
| MD5 |
f27ee33ad45614e525fb73684f194786
|
|
| BLAKE2b-256 |
0a9cb091f3714b1649793695e9826c89ac6d7138ff7eabf1cc8bdce52bfb83de
|