A Swiss-army knife for everyday PDF workflows – pure-Python, cross-platform, and fully typed.
Project description
pdf-tools
A Swiss-army knife for everyday PDF workflows – pure-Python, cross-platform, and fully typed.
Features
| Capability | Sub-command / API | Notes |
|---|---|---|
Convert images (.png, .jpg, …) or Word (.docx) to PDF |
pdf-tools convert … pdf_tools.convert.service.convert_file_to_pdf() |
Uses Pillow + img2pdf for images and LibreOffice / unoserver for Word files. |
| Merge multiple PDFs (or whole folders) | pdf-tools merge … pdf_tools.merge.service.merge_pdfs() |
Preserves bookmarks; skips non-PDF inputs with a warning. |
| Process = Convert → Merge in one go | pdf-tools process convert-and-merge … |
Handy for ad-hoc batches of mixed file types. |
| Watermark (text stamp) | pdf-tools watermark add-text … pdf_tools.watermark.service.add_text_watermark() |
PyMuPDF in-place editing; configurable font, colour, opacity, rotation, position. |
| Async-friendly CLI | Built on Typer + custom AsyncTyper |
Callbacks can be async def – future-proof for parallel work. |
| Pydantic v2 models | File, Files, WatermarkOptions, … |
JSON-serialisable contracts for easy automation. |
| Fully typed + Ruff + Mypy + pytest + hypothesis | CI fails on lint, type, docs, or test issues. |
Installation
pipx install unoserver --system-site-packages
pip install pdf-tools
External dependency: LibreOffice must be installed and on your $PATH for Word→PDF conversion.
- Install unoserver globally using
pipx install --system-site-packages(prefered) orsudo -H pip install - use the bundled python that ships with LibreOffice, or
- call soffice --headless directly (see Batch listeners below).
CLI Quick Start
# 1. Convert a single Word file
pdf-tools convert file-to-pdf draft.docx
# 2. Convert every image in a folder → PDFs in ./out
pdf-tools convert folder-to-pdfs assets/ --output-dir out/
# 3. Merge selected PDFs
pdf-tools merge pdf-files a.pdf b.pdf c.pdf -o merged.pdf
# 4. Merge *all* PDFs in a folder
pdf-tools merge pdfs-in-folder scans/ -o merged.pdf
# 5. One-liner: convert images + docs → merge
pdf-tools process convert-and-merge-pdfs image1.jpg doc1.docx doc2.docx -o final.pdf
# 6. Add a diagonal red DRAFT watermark on every page
pdf-tools watermark add-text src.pdf stamped.pdf \
--text "DRAFT" --color "#FF0000" --font-size 72 --opacity 0.2 --rotation 45
Batch LibreOffice Listener (faster)
# spin up a listener for the whole session (Linux)
unoserver --interface 127.0.0.1 --port 2002 &
export LIBRE_PORT=2002 # used by convert helpers
pdf-tools process convert-and-merge-pdfs ...
kill %1 # when done
Using the Python API
from pathlib import Path
from pdf_tools.convert.service import convert_file_to_pdf
from pdf_tools.merge.service import merge_pdfs
from pdf_tools.watermark.models import WatermarkOptions
from pdf_tools.watermark.service import add_text_watermark
# 1. Convert
img_pdf = convert_file_to_pdf(
input_path=Path("diagram.png"),
output_dir=Path("out"),
)
# 2. Merge two PDFs
merge_pdfs(
input_paths=[Path("intro.pdf"), img_pdf.path],
output_path=Path("bundle.pdf"),
)
# 3. Watermark (first page only)
opts = WatermarkOptions(text="CONFIDENTIAL", font_size=36, all_pages=False)
add_text_watermark(src=Path("bundle.pdf"), dst=Path("bundle_wm.pdf"), opts=opts)
Spinning up transient LibreOffice listener
from pdf_tools.convert.unoserver_ctx import unoserver_listener
from pdf_tools.process.service import convert_and_merge_pdfs
with unoserver_listener(port=2002): # starts & auto-kills unoserver
convert_and_merge_pdfs(
input_paths=[Path("doc1.docx"), Path("pic.jpg")],
output_path=Path("package.pdf"),
)
Development setup
pipx install unoserver --system-site-packages
git clone https://github.com/your-org/pdf-tools.git
cd pdf-tools
poetry install --with dev # include dev/test dependencies
# Run checks
ruff check .
mypy .
pytest -q
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf_toolchest-0.1.13.tar.gz.
File metadata
- Download URL: pdf_toolchest-0.1.13.tar.gz
- Upload date:
- Size: 19.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
534e1799b19fbdef9afce059e3aee711a0cf9478a79841b9075386bd91f36d7f
|
|
| MD5 |
763e4ec641f5e847f43899a982b113db
|
|
| BLAKE2b-256 |
2347e6fc0ad0633db4c3eae4cc0fc5ebb350a1d68684ecf8cd9106700a24ad04
|
Provenance
The following attestation bundles were made for pdf_toolchest-0.1.13.tar.gz:
Publisher:
publish.yml on aswann45/pdf-tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pdf_toolchest-0.1.13.tar.gz -
Subject digest:
534e1799b19fbdef9afce059e3aee711a0cf9478a79841b9075386bd91f36d7f - Sigstore transparency entry: 326227176
- Sigstore integration time:
-
Permalink:
aswann45/pdf-tools@2a3bf0024eb6caa3947f14ec8dab1c467f4ef931 -
Branch / Tag:
refs/tags/0.1.13 - Owner: https://github.com/aswann45
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2a3bf0024eb6caa3947f14ec8dab1c467f4ef931 -
Trigger Event:
release
-
Statement type:
File details
Details for the file pdf_toolchest-0.1.13-py3-none-any.whl.
File metadata
- Download URL: pdf_toolchest-0.1.13-py3-none-any.whl
- Upload date:
- Size: 26.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e301478e4ff0a1a000599a277848dde7a095bf380ebc781357675963a8c4647
|
|
| MD5 |
18e616e41593d86e7d8dcf4885e75459
|
|
| BLAKE2b-256 |
a345710cf6c555597e9d0f6c1bccdff6eb49d7da02270e1bd8ddda7b10cdde7b
|
Provenance
The following attestation bundles were made for pdf_toolchest-0.1.13-py3-none-any.whl:
Publisher:
publish.yml on aswann45/pdf-tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pdf_toolchest-0.1.13-py3-none-any.whl -
Subject digest:
1e301478e4ff0a1a000599a277848dde7a095bf380ebc781357675963a8c4647 - Sigstore transparency entry: 326227207
- Sigstore integration time:
-
Permalink:
aswann45/pdf-tools@2a3bf0024eb6caa3947f14ec8dab1c467f4ef931 -
Branch / Tag:
refs/tags/0.1.13 - Owner: https://github.com/aswann45
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2a3bf0024eb6caa3947f14ec8dab1c467f4ef931 -
Trigger Event:
release
-
Statement type: