Skip to main content

Convert arbitrary PDF files to PDF/A (1b, 2b, 3b)

Project description

pdf2pdfa

CI PyPI

Convert ordinary PDF documents into fully compliant PDF/A files (1b, 2b, 3b).

Installation

pip install pdf2pdfa

Requires Python 3.9+.

CLI Usage

Single file

pdf2pdfa convert input.pdf output.pdf

Choose PDF/A level

pdf2pdfa convert input.pdf output.pdf --level 2b
pdf2pdfa batch *.pdf --level 3b

Supported levels: 1b (default), 2b, 3b.

Batch conversion

pdf2pdfa batch *.pdf

Options

Flag Description
--level LEVEL PDF/A conformance level: 1b, 2b, 3b (default: 1b)
--icc PATH Custom ICC profile
--font PATH Custom TrueType font for embedding
--validate Run verapdf validation after conversion
-v, --verbose Enable debug logging

Python API

from pdf2pdfa import Converter

conv = Converter()                    # PDF/A-1b (default)
conv.convert("input.pdf", "output.pdf")

conv = Converter(level="2b")          # PDF/A-2b
conv.convert("input.pdf", "output_2b.pdf")

What it does

  • Smart font matching: resolves each PDF font to the best system substitute by family (serif/sans/mono), weight (bold/normal), and style (italic/roman)
  • Embeds missing fonts with correct WinAnsiEncoding width metrics
  • Attaches sRGB ICC profile with proper /N on the stream dictionary
  • Replaces DeviceRGB/DeviceCMYK color spaces with ICC-based equivalents
  • Sets PDF/A conformance in XMP metadata (1b, 2b, or 3b)
  • Synchronizes XMP and document info dictionary

v3.1.0 Highlights

  • New: Multi-level PDF/A support — --level 1b (default), --level 2b, --level 3b
  • Python API: Converter(level="2b")
  • OutputIntent /S correctly uses /GTS_PDFA1 for all PDF/A levels per ISO 19005

v3.0.0 Highlights

  • New: Font matching system — each unembedded font is resolved individually instead of using a single fallback
    • Times-Roman → times.ttf (serif), Courier → cour.ttf (mono), Helvetica → arial.ttf (sans)
    • Bold, italic, and bold-italic variants are matched to the correct system font files
    • Graceful degradation: if an exact match isn't found, falls back through style → weight → category
    • Cross-platform support: Windows, macOS, and Linux font paths
    • --font flag still works as a user override for all fonts
  • Refactored: converter.py no longer contains platform-specific font search logic

v2.0.0 Highlights

  • Fixed: ICC profile /N entry now correctly placed on stream dictionary (verapdf validation pass)
  • Fixed: Font glyph width mismatch for WinAnsiEncoding codes 128-159
  • Fixed: DeviceCMYK images now properly covered by CMYK OutputIntent
  • New: batch command for multi-file conversion
  • New: --validate flag for post-conversion verapdf check
  • New: --font flag for custom font embedding
  • Removed: reportlab dependency (no longer needed)
  • Removed: Python 3.7/3.8 support (minimum 3.9)

Development

git clone https://github.com/nks1990/pdf2pdfa.git
cd pdf2pdfa
pip install -e .[test]
pytest -v

License

MIT - see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf2pdfa-3.1.0.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf2pdfa-3.1.0-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file pdf2pdfa-3.1.0.tar.gz.

File metadata

  • Download URL: pdf2pdfa-3.1.0.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pdf2pdfa-3.1.0.tar.gz
Algorithm Hash digest
SHA256 d4111d5e729d2b75d1eb29988e1cc9e4d33cc9e6d41934643da7f743065a3103
MD5 401e03b53ef6fd6f54ea9abc2c62d799
BLAKE2b-256 8eae7753fa6dc755ff275af0fc1a60fca691b67a2d336554fe44cfbf65772bc4

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdf2pdfa-3.1.0.tar.gz:

Publisher: release.yml on nks1990/pdf2pdfa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pdf2pdfa-3.1.0-py3-none-any.whl.

File metadata

  • Download URL: pdf2pdfa-3.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pdf2pdfa-3.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 76df8dfeea1fce7e8f41dd1f4ce798b3e58c8e8f31eee0a2d238f40408918a42
MD5 12751ae3a5f801cb2ef1ef9742b779ff
BLAKE2b-256 36ec41e0efb29de5992de6bfa21632a0cc30accfcc1d03afc02a3bb4fa255eaa

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdf2pdfa-3.1.0-py3-none-any.whl:

Publisher: release.yml on nks1990/pdf2pdfa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page