Python wrappers for pdfcpu to extract and fill PDF forms

Project description

privacyforms-pdf

Python wrappers for pdfcpu to extract and fill PDF forms.

Features

Extract form data from PDF files using pdfcpu
Programmatic API via PDFFormExtractor class
Command-line interface with multiple commands
Full type hints and comprehensive test coverage
Support for all form field types (text, date, checkbox, radio button groups, etc.)

Requirements

Python 3.14+
pdfcpu must be installed on your system

Installation

# Clone the repository
git clone <repo-url>
cd privacyforms-pdf

# Install with uv
uv sync

Quick Start

Check if pdfcpu is installed

pdf-forms check

Command Line Usage

# Check if a PDF contains a form
pdf-forms info form.pdf

# List all form fields
pdf-forms list-fields form.pdf

# Get a specific field value
pdf-forms get-value form.pdf "Field Name"

# Extract form data to JSON
pdf-forms extract form.pdf -o output.json

# Extract form data to stdout
pdf-forms extract form.pdf

# Fill a form from JSON (validates before filling)
pdf-forms fill-form form.pdf data.json -o filled.pdf

# Fill a form without validation
pdf-forms fill-form form.pdf data.json -o filled.pdf --no-validate

# Fill a form in-place (modifies original)
pdf-forms fill-form form.pdf data.json

# Fill with strict mode (requires all form fields)
pdf-forms fill-form form.pdf data.json -o filled.pdf --strict

JSON Format

The fill-form command accepts a simple key:value JSON format where keys are field names and values are the values to fill:

{
  "Candidate Name": "John Smith",
  "Position": "Software Engineer",
  "Start date": "2025-06-01",
  "Full time": true,
  "Diploma or GED": "Yes"
}

Python API

from privacyforms_pdf import PDFFormExtractor

# Initialize the extractor
extractor = PDFFormExtractor()

# Extract form data
form_data = extractor.extract("form.pdf")

# Access form information
print(f"PDF Version: {form_data.pdf_version}")
print(f"Has Form: {form_data.has_form}")
print(f"Total Fields: {len(form_data.fields)}")

# Iterate over fields
for field in form_data.fields:
    print(f"{field.name}: {field.value}")

# Get specific field value
value = extractor.get_field_value("form.pdf", "Field Name")

# Check if PDF has a form
has_form = extractor.has_form("form.pdf")

# Export to JSON file
extractor.extract_to_json("form.pdf", "output.json")

# Fill a form using simple key:value format
form_data = {
    "Candidate Name": "John Smith",
    "Position": "Software Engineer",
    "Full time": True,
    "Start date": "2025-06-01"
}
extractor.fill_form("form.pdf", form_data, "filled.pdf")

# Or fill from a JSON file
extractor.fill_form_from_json("form.pdf", "data.json", "filled.pdf")

# Validate data before filling (returns list of errors)
errors = extractor.validate_form_data("form.pdf", form_data)
if errors:
    print("Validation errors:", errors)

API Reference

`PDFFormExtractor`

The main class for extracting PDF form data.

Constructor

extractor = PDFFormExtractor(pdfcpu_path: str | None = None)

pdfcpu_path: Optional path to the pdfcpu executable. If not provided, searches in system PATH.

Methods

check_pdfcpu() -> bool: Check if pdfcpu is available and working.
get_pdfcpu_version() -> str: Get the installed pdfcpu version.
has_form(pdf_path: str | Path) -> bool: Check if a PDF contains a form.
extract(pdf_path: str | Path) -> PDFFormData: Extract form data from a PDF.
extract_to_json(pdf_path: str | Path, output_path: str | Path) -> None: Export form data to a JSON file.
list_fields(pdf_path: str | Path) -> list[FormField]: List all form fields in a PDF.
get_field_value(pdf_path: str | Path, field_name: str) -> str | bool | None: Get the value of a specific form field.
get_field_by_id(pdf_path: str | Path, field_id: str) -> FormField | None: Get a form field by its ID.
get_field_by_name(pdf_path: str | Path, field_name: str) -> FormField | None: Get a form field by its name.
validate_form_data(pdf_path: str | Path, form_data: dict, *, strict: bool = False, allow_extra_fields: bool = False) -> list[str]: Validate form data (simple key:value format).
fill_form(pdf_path: str | Path, form_data: dict, output_path: str | Path | None = None, *, validate: bool = True) -> Path: Fill a PDF form with data.
fill_form_from_json(pdf_path: str | Path, json_path: str | Path, output_path: str | Path | None = None, *, validate: bool = True) -> Path: Fill a PDF form with data from a JSON file.

Data Classes

`PDFFormData`

Represents extracted PDF form data.

source: Path: Path to the source PDF file.
pdf_version: str: Version of the PDF.
has_form: bool: Whether the PDF contains a form.
fields: list[FormField]: List of form fields.
raw_data: dict[str, Any]: The raw JSON data from pdfcpu.

`FormField`

Represents a single form field.

field_type: str: The type of the form field (e.g., 'textfield', 'checkbox').
pages: list[int]: List of pages where this field appears.
id: str: The unique identifier of the field.
name: str: The name of the field.
value: str | bool: The current value of the field.
locked: bool: Whether the field is locked.

Exceptions

PDFCPUError: Base exception for pdfcpu related errors.
PDFCPUNotFoundError: Raised when pdfcpu is not found on the system.
PDFCPUExecutionError: Raised when pdfcpu execution fails.
PDFFormNotFoundError: Raised when the PDF does not contain any forms.
FormValidationError: Raised when form data validation fails.

Development

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov

# Run linting
uv run ruff check .

# Run type checking
uv run pyright

Project Structure

privacyforms-pdf/
├── privacyforms_pdf/       # Main package
│   ├── __init__.py         # Package exports
│   ├── extractor.py        # PDFFormExtractor implementation
│   └── cli.py              # Command-line interface
├── tests/                  # Test suite
│   ├── test_extractor.py   # Tests for extractor
│   └── test_cli.py         # Tests for CLI
├── pyproject.toml          # Project configuration
└── README.md               # This file

License

Project details

Release history Release notifications | RSS feed

0.1.3

Mar 7, 2026

0.1.2

Mar 6, 2026

0.1.1

Mar 6, 2026

This version

0.1.0

Mar 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

privacyforms_pdf-0.1.0.tar.gz (2.7 MB view details)

Uploaded Mar 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

privacyforms_pdf-0.1.0-py3-none-any.whl (11.5 kB view details)

Uploaded Mar 6, 2026 Python 3

File details

Details for the file privacyforms_pdf-0.1.0.tar.gz.

File metadata

Download URL: privacyforms_pdf-0.1.0.tar.gz
Upload date: Mar 6, 2026
Size: 2.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for privacyforms_pdf-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5b49e963e11923912f60925c3d2822de0b473660b943f1f4aa73ec26bab53f6c`
MD5	`3d88cf433d3992bf001a907b45ba3ca4`
BLAKE2b-256	`91ee4c93e5e5fd76738f4b5c113f16392d48824f78a1341ab5d5b9fd5b4c04bd`

See more details on using hashes here.

File details

Details for the file privacyforms_pdf-0.1.0-py3-none-any.whl.

File metadata

Download URL: privacyforms_pdf-0.1.0-py3-none-any.whl
Upload date: Mar 6, 2026
Size: 11.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for privacyforms_pdf-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f3c804cac5bd203308dd0c31c12a1a52bfc7933725d23f56562b4bdab241a17b`
MD5	`be43e218742a02e7b75865932a563c14`
BLAKE2b-256	`1adfad7c99ea7ca3cd64d9e4d44e02b3d7c1589ee5022af9cc652e8c5dc3aa00`

See more details on using hashes here.

privacyforms-pdf 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

privacyforms-pdf

Features

Requirements

Installation

Quick Start

Check if pdfcpu is installed

Command Line Usage

JSON Format

Python API

API Reference

PDFFormExtractor

Constructor

Methods

Data Classes

PDFFormData

FormField

Exceptions

Development

Running Tests

Project Structure

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`PDFFormExtractor`

`PDFFormData`

`FormField`