Skip to main content

Your pages, your way — PDF, DOCX, images and more

Project description

PageFuse

Your pages, your way. Combine pages from PDFs, Word docs, PowerPoint slides, images, Markdown, and more into a single output.

Supported Input Formats

Format Extensions Requires LibreOffice
PDF .pdf No
Images .png, .jpg, .jpeg, .tiff, .tif No
Markdown .md, .markdown No
Word .docx, .doc Yes
PowerPoint .pptx, .ppt Yes
OpenDocument .odt, .odp Yes
Web .html Yes

Supported Output Formats

Output format is determined by the file extension in your config or command:

Format Extension Requires LibreOffice Notes
PDF .pdf No Default — fast, lossless
Image .png, .jpg, .tiff No Single page → file; multi-page → .png.zip
HTML .html No Self-contained — pages rendered as embedded images
Word .docx, .odt Yes Not valid output from presentation sources
Presentation .odp Yes Not valid output from word-processor sources

Installation

# Linux (recommended — avoids system Python restrictions)
pipx install pagefuse

# macOS
pip install pagefuse

# Windows
pip install pagefuse

# Or inside a virtual environment (any platform)
python3 -m venv venv && source venv/bin/activate
pip install pagefuse

# Via Cargo (requires Python 3.9+ on PATH)
cargo install pagefuse

Linux note: If you see error: externally-managed-environment, use pipx instead of pip. Install pipx with: sudo apt install pipx && pipx ensurepath

Uninstall

pipx uninstall pagefuse   # if installed via pipx
pip uninstall pagefuse    # if installed via pip

LibreOffice is required only for Office/OpenDocument formats. PDF, image, HTML, and Markdown output all work without it.

# Ubuntu / Debian
sudo apt install libreoffice

# macOS
brew install --cask libreoffice

# Windows
# Download from https://www.libreoffice.org/download and add soffice.exe to PATH

Usage

Global options

These options apply to all subcommands and must be placed before the subcommand name:

pagefuse [OPTIONS] COMMAND [ARGS]...
Option Default Description
--lo-timeout SECS 300 LibreOffice conversion timeout in seconds. No hard limit — increase for large or complex files.
--version Print version and exit.

Example:

pagefuse --lo-timeout 600 assemble output.docx big_report.pdf:all

Assemble documents

Combine pages from multiple documents into one output. Pass a .fuse config file, or use inline arguments:

# Inline — output first, then sources
pagefuse assemble output.pdf cover.pdf:1 terms.docx:all pricing.pdf:1-3 slides.pptx:2,4,6

# Export as Word document
pagefuse assemble output.docx cover.pdf:1 terms.docx:all

# Export as self-contained HTML
pagefuse assemble output.html report.pdf:1-5

# Export as images (multi-page → output.png.zip)
pagefuse assemble output.png report.pdf:1-3

# From a config file
pagefuse assemble board_pack.fuse

# Preview without writing any files
pagefuse assemble --dry-run board_pack.fuse
pagefuse assemble --dry-run output.pdf cover.pdf:1 terms.docx:all

Each source is file:pages. Omit :pages to include all pages.

Example board_pack.fuse:

# Output format is determined by the extension (.pdf, .docx, .html, .png, …)
# Add multiple output: lines to export to several formats in one run.
output: board_pack.pdf
output: board_pack.docx
output: board_pack_preview.png

# Metadata (title defaults to output filename if omitted)
title:   Q4 Board Pack
author:  Finance Team
subject: Board meeting materials

file: templates/cover_letter.pdf       1
file: reports/financial_data.docx      all
file: slides/main_deck.pptx            1-4
file: reports/charts.pdf               3,5,7
file: templates/signature_page.pdf     1

Split a document into parts

Extract pages from one document into multiple outputs. Pass a .fuse config file, or use inline arguments:

# Inline — source first, then outputs with page specs
pagefuse split report.pdf cover.pdf:1 body.pdf:2-10 appendix.pdf:11-20

# Each output can be a different format
pagefuse split report.pdf summary.pdf:1 full.docx:all preview.png:1

# From a config file
pagefuse split split.fuse

# Preview without writing any files
pagefuse split --dry-run report.pdf cover.pdf:1 body.pdf:2-10
pagefuse split --dry-run split.fuse

Note: Images (.png, .jpg, etc.) cannot be used as split sources. Presentation sources (.pptx, .ppt, .odp) cannot produce word-processor outputs (.docx, .odt), and word-processor sources cannot produce .odp. Run pagefuse info <file> to see what is supported for a given file.

Example split.fuse:

source: annual_report.pdf

# Metadata (optional — defaults to source file metadata)
title:   Annual Report
author:  Finance Team

output: cover.pdf              1
output: executive_summary.pdf  2-5
output: financials.pdf         6-20
output: appendix.docx          21-30
output: cover_preview.png      1

Each output is file:pages. Omit :pages to copy all pages.

Generate a config template

Use pagefuse init to generate a starter .fuse file:

pagefuse init                            # assemble config → config.fuse
pagefuse init --output board_pack.fuse   # custom filename

pagefuse init --split                    # split config → config.fuse
pagefuse init --split --output split.fuse

The --split flag generates a split-style template (with source: and output: lines) instead of the default assemble-style template (with file: and output: lines).

Inspect a document

Show page count, metadata, and format support for one or more files:

pagefuse info report.pdf
pagefuse info report.pdf slides.pptx photo.png

Output includes a Format Support table showing which commands accept the file as input and what output formats are available:

  File    slides.pptx
  Format  PPTX
  Pages   12

              Format Support
 ┌──────────┬───────────────┬─────────────────────────────────────┐
 │ Command  │ Input support │ Output support                      │
 ├──────────┼───────────────┼─────────────────────────────────────┤
 │ assemble │ yes           │ .html  .jpg  .jpeg  .odp  .pdf  ... │
 │ split    │ yes           │ .html  .jpg  .jpeg  .odp  .pdf  ... │
 └──────────┴───────────────┴─────────────────────────────────────┘

Version

pagefuse --version

Page Specification Syntax

Spec Meaning
all Every page
5 Page 5 only
1-3 Pages 1 through 3
1,3,5 Pages 1, 3, and 5
1-3,5,7-9 Mixed ranges and singles

Page numbers are 1-based.

Error Handling

PageFuse validates all inputs before any work starts:

  • File not found — all missing files are reported together
  • Wrong format — unsupported extensions are caught early
  • Invalid page spec — space instead of colon (e.g. file.pdf 1) is detected and corrected
  • Invalid page range — all out-of-range specs across all files are reported at once, showing the filename and its actual page count
  • Output format — unsupported output extensions are caught before conversion begins
  • Format incompatibility — presentation sources cannot produce word-processor outputs and vice versa; pagefuse info <file> shows what is allowed

Example:

Error: Page specification errors:
  range '2-50' is invalid in 'report.pdf' (12 pages total)
  page 15 does not exist in 'cover.pdf' (3 pages total)

Performance

  • Parallel input conversion — up to 4 source files converted simultaneously
  • Parallel output writing — multiple output formats written simultaneously
  • Resource estimation — estimated peak memory (worst-case concurrent footprint) and disk usage shown before work starts; warns if disk space is tight
  • Live progress table — shows all tasks upfront with animated spinner on active tasks, checkmark on completed, file sizes, memory usage, and elapsed time per task
  • Thread-safe rendering — pypdfium2 rendering serialised to prevent crashes on concurrent image/HTML output

Development

git clone https://github.com/raptorgold14/pagefuse.git
cd pagefuse
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
pip install -e .

Run tests:

pytest

See examples/ for sample .fuse configs and examples/generate_pdfs.py to regenerate fixture files.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pagefuse-0.1.2.tar.gz (34.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pagefuse-0.1.2-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file pagefuse-0.1.2.tar.gz.

File metadata

  • Download URL: pagefuse-0.1.2.tar.gz
  • Upload date:
  • Size: 34.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for pagefuse-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3afbd90a33280d187ff4070e024c222618d80eec67287c5e9e52599fd63cf0eb
MD5 7cc45706cefc1c9cf7cf57974380eb19
BLAKE2b-256 a3f2d9843063a184f51cfd0b48846dd13590dd7a0a5780bbf9b99cf2328dcbe0

See more details on using hashes here.

File details

Details for the file pagefuse-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: pagefuse-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 29.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for pagefuse-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6e277dae164e93337d0da6be050a557daf95e5ef4ae0d0add2fc0ba6b920608d
MD5 107b064639e75758f1ab6dfb5ab6f02d
BLAKE2b-256 8f347eacd19c4583bbd710fd6832169dd46402a7439e65d6ecc9b03c770ddcf6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page