Convert arbitrary PDF files to PDF/A (1b, 2b, 3b)
Project description
pdf2pdfa
Convert ordinary PDF documents into fully compliant PDF/A files (1b, 2b, 3b).
Installation
pip install pdf2pdfa
Requires Python 3.9+.
CLI Usage
Single file
pdf2pdfa convert input.pdf output.pdf
Choose PDF/A level
pdf2pdfa convert input.pdf output.pdf --level 2b
pdf2pdfa batch *.pdf --level 3b
Supported levels: 1b (default), 2b, 3b.
Batch conversion
pdf2pdfa batch *.pdf
Options
| Flag | Description |
|---|---|
--level LEVEL |
PDF/A conformance level: 1b, 2b, 3b (default: 1b) |
--icc PATH |
Custom ICC profile |
--font PATH |
Custom TrueType font for embedding |
--validate |
Run verapdf validation after conversion |
-v, --verbose |
Enable debug logging |
Python API
from pdf2pdfa import Converter
conv = Converter() # PDF/A-1b (default)
conv.convert("input.pdf", "output.pdf")
conv = Converter(level="2b") # PDF/A-2b
conv.convert("input.pdf", "output_2b.pdf")
What it does
- Smart font matching: resolves each PDF font to the best system substitute by family (serif/sans/mono), weight (bold/normal), and style (italic/roman)
- Embeds missing fonts with correct WinAnsiEncoding width metrics
- Attaches sRGB ICC profile with proper
/Non the stream dictionary - Replaces DeviceRGB/DeviceCMYK color spaces with ICC-based equivalents
- Sets PDF/A conformance in XMP metadata (1b, 2b, or 3b)
- Synchronizes XMP and document info dictionary
v3.1.0 Highlights
- New: Multi-level PDF/A support —
--level 1b(default),--level 2b,--level 3b - Python API:
Converter(level="2b") - OutputIntent
/Scorrectly uses/GTS_PDFA1for all PDF/A levels per ISO 19005
v3.0.0 Highlights
- New: Font matching system — each unembedded font is resolved individually instead of using a single fallback
- Times-Roman →
times.ttf(serif), Courier →cour.ttf(mono), Helvetica →arial.ttf(sans) - Bold, italic, and bold-italic variants are matched to the correct system font files
- Graceful degradation: if an exact match isn't found, falls back through style → weight → category
- Cross-platform support: Windows, macOS, and Linux font paths
--fontflag still works as a user override for all fonts
- Times-Roman →
- Refactored:
converter.pyno longer contains platform-specific font search logic
v2.0.0 Highlights
- Fixed: ICC profile
/Nentry now correctly placed on stream dictionary (verapdf validation pass) - Fixed: Font glyph width mismatch for WinAnsiEncoding codes 128-159
- Fixed: DeviceCMYK images now properly covered by CMYK OutputIntent
- New:
batchcommand for multi-file conversion - New:
--validateflag for post-conversion verapdf check - New:
--fontflag for custom font embedding - Removed:
reportlabdependency (no longer needed) - Removed: Python 3.7/3.8 support (minimum 3.9)
Development
git clone https://github.com/nks1990/pdf2pdfa.git
cd pdf2pdfa
pip install -e .[test]
pytest -v
License
MIT - see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf2pdfa-3.1.0.tar.gz.
File metadata
- Download URL: pdf2pdfa-3.1.0.tar.gz
- Upload date:
- Size: 25.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4111d5e729d2b75d1eb29988e1cc9e4d33cc9e6d41934643da7f743065a3103
|
|
| MD5 |
401e03b53ef6fd6f54ea9abc2c62d799
|
|
| BLAKE2b-256 |
8eae7753fa6dc755ff275af0fc1a60fca691b67a2d336554fe44cfbf65772bc4
|
Provenance
The following attestation bundles were made for pdf2pdfa-3.1.0.tar.gz:
Publisher:
release.yml on nks1990/pdf2pdfa
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pdf2pdfa-3.1.0.tar.gz -
Subject digest:
d4111d5e729d2b75d1eb29988e1cc9e4d33cc9e6d41934643da7f743065a3103 - Sigstore transparency entry: 1006403175
- Sigstore integration time:
-
Permalink:
nks1990/pdf2pdfa@2b60eb7b4f12f2c37f254dfb49623d6bcfaa9faa -
Branch / Tag:
refs/tags/v3.1.0 - Owner: https://github.com/nks1990
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2b60eb7b4f12f2c37f254dfb49623d6bcfaa9faa -
Trigger Event:
push
-
Statement type:
File details
Details for the file pdf2pdfa-3.1.0-py3-none-any.whl.
File metadata
- Download URL: pdf2pdfa-3.1.0-py3-none-any.whl
- Upload date:
- Size: 21.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76df8dfeea1fce7e8f41dd1f4ce798b3e58c8e8f31eee0a2d238f40408918a42
|
|
| MD5 |
12751ae3a5f801cb2ef1ef9742b779ff
|
|
| BLAKE2b-256 |
36ec41e0efb29de5992de6bfa21632a0cc30accfcc1d03afc02a3bb4fa255eaa
|
Provenance
The following attestation bundles were made for pdf2pdfa-3.1.0-py3-none-any.whl:
Publisher:
release.yml on nks1990/pdf2pdfa
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pdf2pdfa-3.1.0-py3-none-any.whl -
Subject digest:
76df8dfeea1fce7e8f41dd1f4ce798b3e58c8e8f31eee0a2d238f40408918a42 - Sigstore transparency entry: 1006403177
- Sigstore integration time:
-
Permalink:
nks1990/pdf2pdfa@2b60eb7b4f12f2c37f254dfb49623d6bcfaa9faa -
Branch / Tag:
refs/tags/v3.1.0 - Owner: https://github.com/nks1990
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2b60eb7b4f12f2c37f254dfb49623d6bcfaa9faa -
Trigger Event:
push
-
Statement type: