Skip to main content

Ultra-fast PDF to PNG converter

Project description

Built for Miruiq — AI-powered data extraction from PDFs and documents.

Miruiq

fastpdf2png

Ultra-fast PDF to PNG converter. Pre-forked worker pool, SIMD-optimized encoding, automatic grayscale detection, zero-copy RGBA rendering. MIT licensed.

License Platform

Install

pip install fastpdf2png

Or build from source:

git clone https://github.com/nataell95/fastpdf2png.git && cd fastpdf2png
bash scripts/build.sh

Usage

CLI

# Single PDF
./build/fastpdf2png input.pdf page_%03d.png 150 8 -c 2

# Batch / streaming pool (many PDFs, max throughput)
for f in docs/*.pdf; do echo "$f\toutput/${f%.pdf}_%03d.png"; done | \
  ./build/fastpdf2png --pool 150 8 -c 2

Python

import fastpdf2png

images = fastpdf2png.to_images("doc.pdf")        # list of PIL images
fastpdf2png.to_files("doc.pdf", "output/")        # save PNGs to disk
data   = fastpdf2png.to_bytes("doc.pdf")          # raw PNG bytes
n      = fastpdf2png.page_count("doc.pdf")        # page count

# High-throughput batch processing
with fastpdf2png.Engine(workers=8) as pdf:
    pdf.to_files_many(pdf_list, "output/", dpi=150)

Node.js

const pdf = require("fastpdf2png");

pdf.toFiles("doc.pdf", "output/", { dpi: 150 });
const buffers = pdf.toBuffers("doc.pdf");
const count = pdf.pageCount("doc.pdf");

// Batch processing
const engine = new pdf.Engine();
await engine.toFiles("doc.pdf", "output/");
engine.close();

Performance

Single PDF worker scaling (150 DPI, compression level 2, Apple M-series):

Workers Pages/sec
1 323
2 582
4 985
8 1,536

Single-process throughput vs other tools (71-page mixed PDF, Apple M3 Max):

DPI fastpdf2png MuPDF PyMuPDF
72 531 119 101
150 323 37 30
300 145 12 9

Smaller output files for grayscale pages thanks to automatic grayscale detection.

Batch processing (200 PDFs, 324 pages, --pool mode):

Workers 150 DPI
4 174 pg/s
8 318 pg/s

Worker scaling

Benchmark

How it works

Architecture

Rendering

Google's PDFium (the engine inside Chromium) renders each page into a raw RGBA bitmap using FPDF_REVERSE_BYTE_ORDER. This produces RGBA pixels directly — no BGRA-to-RGBA swizzle needed, and fpng can encode them with zero conversion overhead.

Grayscale detection

Before encoding, a SIMD-accelerated pass scans every pixel to check if R == G == B. Most document pages (text, tables, charts) are grayscale — detecting this lets us encode them as 8-bit PNG instead of 24-bit RGB, cutting data size by 66% with zero quality loss. On ARM this uses NEON vld4/vceq intrinsics; on x86 it uses SSE/AVX2.

PNG encoding

Instead of the standard zlib/libpng pipeline, we use libdeflate for compression and fpng for fast encoding. The compressed data goes directly into a pre-allocated output buffer — the PNG header, IDAT chunk, and IEND trailer are assembled around it with zero intermediate copies. CRC32 checksums use hardware-accelerated instructions (CRC32 on ARM, PCLMUL on x86). Each page is written with a single write() syscall.

Pool mode

The --pool command pre-forks N worker processes at startup. Workers stay alive and wait for jobs on pipes (zero CPU waste when idle). The parent reads PDF paths from stdin and dispatches them immediately to workers via pipe IPC. Large multi-page PDFs are automatically split into page ranges across workers for load balancing. Each worker loads PDFs into memory with read() and parses them with FPDF_LoadMemDocument64, eliminating syscalls during PDF parsing.

On Windows, pool mode uses CreateProcess with anonymous pipes — same architecture, Win32 APIs.

Memory pools

Each worker maintains process-local memory pools for pixel buffers and compression scratch space. After the first page warms up the pools, subsequent pages require zero malloc/free calls in the hot path.

CLI reference

fastpdf2png <input.pdf> <output_%03d.png> [dpi] [workers] [-c level]
fastpdf2png --pool [dpi] [workers] [-c level]    < job_list
fastpdf2png --info <input.pdf>
fastpdf2png --daemon
Flag Default Description
dpi 150 Output resolution
workers 4 Parallel processes
-c -1 Raw PPM/PGM output (no compression, max speed)
-c 0 Fast PNG (fpng)
-c 1 Medium PNG (fpng slower)
-c 2 2 Best PNG (libdeflate, smallest files)
--pool Streaming worker pool (reads jobs from stdin)
--info Print page count
--daemon Persistent mode (stdin commands)

Pool mode reads pdf_path\toutput_pattern lines from stdin, one per line.

Platforms

OS Arch SIMD Pool mode
macOS arm64 NEON fork + pipes
macOS x86_64 AVX2, SSE4.1 fork + pipes
Linux x86_64 AVX2, SSE4.1 fork + pipes
Linux arm64 NEON fork + pipes
Windows x86_64 AVX2, SSE4.1 CreateProcess + pipes

License

MIT. See LICENSE and THIRD_PARTY_LICENSES.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fastpdf2png-2.0.0-py3-none-win_amd64.whl (3.6 MB view details)

Uploaded Python 3Windows x86-64

fastpdf2png-2.0.0-py3-none-manylinux_2_17_x86_64.whl (3.5 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ x86-64

fastpdf2png-2.0.0-py3-none-macosx_15_0_arm64.whl (3.3 MB view details)

Uploaded Python 3macOS 15.0+ ARM64

File details

Details for the file fastpdf2png-2.0.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: fastpdf2png-2.0.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 3.6 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fastpdf2png-2.0.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 64dd21a6c59ebe691a7ca81fcb7242f3b17896a846ee8f3f5b863d6da47e898b
MD5 95ab6e0cca54b7de88949e5e73955d90
BLAKE2b-256 a02816b43d9194c9a37871140e51914369c4cb9c57af063e48d39db1a3d0b254

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastpdf2png-2.0.0-py3-none-win_amd64.whl:

Publisher: build.yml on nataell95/fastpdf2png

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastpdf2png-2.0.0-py3-none-manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for fastpdf2png-2.0.0-py3-none-manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 f4f98a3c63fd501793fb57d54e5564e53ca2cefa678a70fc6bcebe5fb8d1e017
MD5 d0579b7905f8bf70854cb745188a751a
BLAKE2b-256 e3ecbd3aa0eb7d67f66efd95688f753451dc361128c2267469c6a793f7459015

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastpdf2png-2.0.0-py3-none-manylinux_2_17_x86_64.whl:

Publisher: build.yml on nataell95/fastpdf2png

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fastpdf2png-2.0.0-py3-none-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for fastpdf2png-2.0.0-py3-none-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 c16a5f6d9ccec5268fd03ef219afcca37ff3d6bd3041c1f3e07fb7a6e4e417a9
MD5 29f6deeb02ccbd126e5234e439bc40fd
BLAKE2b-256 b2881ba34a4042e03146e58c8a5712b4b38e9d21b88b3b9cd508fcbadf68c20e

See more details on using hashes here.

Provenance

The following attestation bundles were made for fastpdf2png-2.0.0-py3-none-macosx_15_0_arm64.whl:

Publisher: build.yml on nataell95/fastpdf2png

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page