Convert PDFs to PDF/A using GhostScript and validate compliance with VeraPDF.

These details have not been verified by PyPI

Project links

Project description

pdfa-parser

Convert PDFs to PDF/A using GhostScript and validate compliance with VeraPDF.

pdfa-parser is a lightweight Python library (Python ≥ 3.14) that wraps GhostScript for PDF → PDF/A conversion and VeraPDF for conformance validation. Both synchronous and asynchronous APIs are provided out of the box.

Features

PDF → PDF/A conversion via GhostScript (levels 1, 2, 3).
PDF/A validation via VeraPDF (flavours 1a/1b, 2a/2b, 3a/3b, …).
Sync & async – every public method has an a_ async counterpart.
Factory function (create_parser()) for zero-config quick start.
Adapter pattern – swap GhostScript/VeraPDF for any CLI tool by implementing IBaseAdapter.
CLI – python -m pdfa_parser input.pdf output.pdf.

Project structure

pdfa-parser/
├── scripts/
│   └── setup_binaries.py          # Download & install GhostScript + VeraPDF
├── src/
│   ├── bin/
│   │   ├── ghostscript/            # GhostScript binary (gswin64c.exe / gs)
│   │   └── verapdf/                # VeraPDF CLI (verapdf.bat / verapdf)
│   └── pdfa_parser/
│       ├── __init__.py             # Public API + create_parser()
│       ├── __main__.py             # python -m pdfa_parser
│       ├── main.py                 # CLI entry-point
│       ├── pdf_parser.py           # PdfParser – convert / validate
│       ├── settings.py             # Binary path resolution
│       ├── interfaces/
│       │   ├── base_adapter.py     # IBaseAdapter (ABC)
│       │   └── binary_executer.py  # BinaryExecuter (facade)
│       └── implementations/
│           ├── ghostscript_adapter.py
│           └── verapdf_adapter.py
├── tests/
│   ├── conftest.py                 # Fixtures: PDF generation, skip markers
│   ├── test_unit.py                # 26 unit tests (no binaries needed)
│   └── test_integration.py         # 20 integration tests (need binaries)
├── pyproject.toml
└── README.md

Requirements

Dependency	Required for	Notes
Python ≥ 3.14	Everything	Uses `match/case`, `type \| union`, etc.
GhostScript	Conversion	`gswin64c.exe` (Win) or `gs` (Unix)
VeraPDF	Validation	Requires Java ≥ 11 on `PATH`

Installation

1. Clone & create virtual environment

git clone <repo-url> pdfa-parser
cd pdfa-parser
python -m venv .venv
# Windows
.venv\Scripts\activate
# Linux / macOS
source .venv/bin/activate

2. Install the package (editable + dev dependencies)

# Using uv (recommended)
uv pip install -e ".[dev]"

# Or plain pip
pip install -e ".[dev]"

3. Install binaries

The automated setup script downloads GhostScript and VeraPDF into src/bin/:

python scripts/setup_binaries.py          # both
python scripts/setup_binaries.py --gs      # GhostScript only
python scripts/setup_binaries.py --verapdf # VeraPDF only

Prerequisites for the script:

Binary	Windows requirement	Unix requirement
GhostScript	7-Zip on `PATH` (`7z`)	`tar` (pre-installed)
VeraPDF	Java ≥ 11 on `PATH`	Java ≥ 11 on `PATH`

Tip: You can also install GhostScript / VeraPDF manually and copy (or symlink) the executables into src/bin/ghostscript/ and src/bin/verapdf/.

Quick start

Python API

from pdfa_parser import create_parser

# Create a parser with default adapters
parser = create_parser()                    # GhostScript + VeraPDF
parser = create_parser(with_verapdf=False)  # GhostScript only

# Convert PDF to PDF/A-2
parser.convert("input.pdf", "output_pdfa.pdf")

# Validate a file
result = parser.validate("output_pdfa.pdf", flavour="2b")
print(result.compliant)   # True / False
print(result.profile)     # "PDF/A-2B validation profile"

# One-shot: convert then validate
result = parser.convert_and_validate("input.pdf", "output_pdfa.pdf")
assert result.compliant

Async API

Every method has an a_ prefix async twin:

import asyncio
from pdfa_parser import create_parser

async def main():
    parser = create_parser()
    await parser.a_convert("input.pdf", "output.pdf")
    result = await parser.a_validate("output.pdf")
    print(result.compliant)

asyncio.run(main())

CLI

# Basic conversion
python -m pdfa_parser input.pdf output.pdf

# With validation
python -m pdfa_parser input.pdf output.pdf --validate

# PDF/A level 1, flavour 1b
python -m pdfa_parser input.pdf output.pdf --level 1 --validate --flavour 1b

Advanced usage

Custom adapters

from pdfa_parser import IBaseAdapter, BinaryExecuter, PdfParser
from pathlib import Path

class MyGSAdapter(IBaseAdapter):
    def get_binary_path(self) -> Path:
        return Path("/usr/local/bin/gs")

parser = PdfParser(
    gs_executer=BinaryExecuter(MyGSAdapter()),
    pdfa_level=3,
    extra_gs_args=("-dQUIET",),
)

Custom PDF/A level & extra GhostScript flags

parser = create_parser(pdfa_level=3, extra_gs_args=("-dQUIET", "-r300"))

Testing

# Unit tests (no binaries required) – always runnable
pytest tests/test_unit.py -v

# Integration tests (require GhostScript + VeraPDF in src/bin/)
pytest tests/test_integration.py -v

# Everything
pytest -v

Test coverage summary

Suite	Tests	Requires binaries	What it covers
`test_unit.py`	26	No	Helpers, XML parsing, arg building, mocked convert/validate, async, factory
`test_integration`	20	Yes	Real conversion (portrait, landscape, color, multi-page, text-heavy), VeraPDF validation, round-trip, async, PDF/A-1b

Integration tests generate various PDF types using reportlab (portrait, landscape, coloured shapes, multi-page, text-heavy) and run them through the full GhostScript → VeraPDF pipeline. Tests are auto-skipped when binaries are missing.

License

See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.1

Apr 14, 2026

1.1.0

Apr 14, 2026

1.0.1

Apr 14, 2026

1.0.0b1 pre-release

Apr 14, 2026

This version

0.1.0

Apr 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pdfa_parser-0.1.0-py3-none-any.whl (39.9 kB view details)

Uploaded Apr 14, 2026 Python 3

File details

Details for the file pdfa_parser-0.1.0-py3-none-any.whl.

File metadata

Download URL: pdfa_parser-0.1.0-py3-none-any.whl
Upload date: Apr 14, 2026
Size: 39.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.1 {"installer":{"name":"uv","version":"0.11.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pdfa_parser-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`41953b50bab59a2f3ff45535ca4166a4533d29313ea91ed5e455965a052957b8`
MD5	`d655b0210f2b74643d41d0a518e10ed5`
BLAKE2b-256	`6d4d19a914c57b279ef07a78de1b043247a5c458136f8c53b8decfcda7d60394`

See more details on using hashes here.

pdfa-parser 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pdfa-parser

Features

Project structure

Requirements

Installation

1. Clone & create virtual environment

2. Install the package (editable + dev dependencies)

3. Install binaries

Quick start

Python API

Async API

CLI

Advanced usage

Custom adapters

Custom PDF/A level & extra GhostScript flags

Testing

Test coverage summary

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes