Python bindings for MicroPDF - High-performance PDF manipulation library

These details have not been verified by PyPI

Project links

Project description

MicroPDF Python Bindings

High-performance PDF manipulation library for Python with native Rust FFI bindings.

Features

🚀 Fast - Powered by Rust and MicroPDF
🐍 Pythonic - Clean, idiomatic Python API
🔧 Easy to Use - Simple API for common tasks
🎯 Type-Safe - Full type hints with mypy support
📦 Zero Dependencies - Only requires cffi
🔒 Memory Safe - Automatic resource management

Installation

From Source

# Build the Rust library first
cd ../micropdf-rs
cargo build --release

# Install Python package
cd ../micropdf-py
pip install -e .

Requirements

Python 3.8+
cffi >= 1.16.0
Compiled micropdf-rs library

Quick Start

Easy API (Recommended for Beginners)

from micropdf import EasyPDF

# Extract text from all pages
text = EasyPDF.extract_text('document.pdf')
print(text)

# Extract text from specific page
text = EasyPDF.extract_text('document.pdf', page=0)

# Render page to PNG
EasyPDF.render_to_png('document.pdf', 'output.png', page=0, dpi=300)

# Get document info
info = EasyPDF.get_info('document.pdf')
print(f"Pages: {info.page_count}")
print(f"Title: {info.title}")
print(f"Author: {info.author}")

Fluent API with Context Manager

from micropdf import EasyPDF

with EasyPDF.open('document.pdf') as pdf:
    # Get info
    print(f"Pages: {pdf.page_count()}")
    print(f"Metadata: {pdf.get_metadata()}")

    # Extract text from all pages
    all_text = pdf.extract_all_text()

    # Extract text from specific page
    page_text = pdf.extract_page_text(0)

    # Search across all pages
    results = pdf.search_all('keyword')
    for result in results:
        print(f"Found on page {result['page_num']}: {result['bbox']}")

    # Render pages
    pdf.render_page(0, 'page0.png', dpi=300)

    # Render all pages
    paths = pdf.render_all_pages('output_dir', dpi=150)
    print(f"Generated {len(paths)} images")

Low-Level API (Advanced)

from micropdf import Context, Document, Pixmap, Colorspace, Matrix

# Create context
with Context() as ctx:
    # Open document
    with Document.open(ctx, 'document.pdf') as doc:
        print(f"Pages: {doc.page_count()}")
        print(f"Title: {doc.get_metadata('Title')}")

        # Load page
        with doc.load_page(0) as page:
            # Get page bounds
            bounds = page.bounds()
            print(f"Size: {bounds.width()} x {bounds.height()}")

            # Extract text
            text = page.extract_text()
            print(text[:100])

            # Render to pixmap
            matrix = Matrix.scale(2.0, 2.0)  # 2x scale = 144 DPI
            colorspace = Colorspace.device_rgb(ctx)

            with Pixmap.from_page(ctx, page, matrix, colorspace) as pix:
                print(f"Image: {pix.width()}x{pix.height()}")
                pix.save_png('output.png')

API Reference

Easy API

Static Methods

# One-liner operations
text = EasyPDF.extract_text('file.pdf')
text = EasyPDF.extract_text('file.pdf', page=0)
EasyPDF.render_to_png('in.pdf', 'out.png', page=0, dpi=300)
count = EasyPDF.get_page_count('file.pdf')
info = EasyPDF.get_info('file.pdf')

Instance Methods

with EasyPDF.open('file.pdf') as pdf:
    # Document info
    pages = pdf.page_count()
    encrypted = pdf.is_encrypted()
    metadata = pdf.get_metadata()
    info = pdf.get_info()

    # Text extraction
    all_text = pdf.extract_all_text()
    page_text = pdf.extract_page_text(0)

    # Search
    results = pdf.search_all('keyword')

    # Rendering
    pdf.render_page(0, 'page.png', dpi=300)
    paths = pdf.render_all_pages('output_dir', dpi=150)

    # Page info
    bounds = pdf.get_page_bounds(0)

Core Classes

Context

ctx = Context()  # Default 256 MB cache
ctx = Context(max_store=512 * 1024 * 1024)  # 512 MB

with Context() as ctx:
    # ... operations ...
    pass  # Auto-cleanup

Document

# Open from file
doc = Document.open(ctx, 'file.pdf')

# Open from bytes
with open('file.pdf', 'rb') as f:
    data = f.read()
doc = Document.from_bytes(ctx, data)

# Operations
page_count = doc.page_count()
is_encrypted = doc.needs_password()
success = doc.authenticate('password')
title = doc.get_metadata('Title')
page = doc.load_page(0)
doc.save('output.pdf')

Page

page = doc.load_page(0)

# Get bounds
bounds = page.bounds()  # Returns Rect

# Extract text
text = page.extract_text()

# Search
quads = page.search_text('keyword', max_hits=512)

Pixmap

# Create pixmap
colorspace = Colorspace.device_rgb(ctx)
pix = Pixmap.create(ctx, colorspace, 100, 100, alpha=False)

# Render from page
matrix = Matrix.scale(2.0, 2.0)
pix = Pixmap.from_page(ctx, page, matrix, colorspace)

# Operations
width = pix.width()
height = pix.height()
components = pix.components()
raw_data = pix.samples()
png_data = pix.to_png()
pix.save_png('output.png')

Geometry

from micropdf.geometry import Point, Rect, Matrix, Quad

# Point
p = Point(10, 20)
distance = p.distance(other_point)
p2 = p.transform(matrix)

# Rect
r = Rect(0, 0, 612, 792)  # US Letter size
width = r.width()
height = r.height()
area = r.area()
contains = r.contains(point)
intersection = r1.intersect(r2)
union = r1.union(r2)

# Matrix
m = Matrix.identity()
m = Matrix.scale(2.0, 2.0)
m = Matrix.translate(10, 20)
m = Matrix.rotate(90)
m3 = m1.concat(m2)
m3 = m1 @ m2  # Matrix multiplication

# Quad (for rotated text bounding boxes)
q = Quad.from_rect(rect)
r = q.to_rect()

Colorspace

gray = Colorspace.device_gray(ctx)
rgb = Colorspace.device_rgb(ctx)
bgr = Colorspace.device_bgr(ctx)
cmyk = Colorspace.device_cmyk(ctx)

n = colorspace.components()  # 1, 3, or 4
name = colorspace.name()  # "DeviceRGB", etc.

Examples

Extract Text from All Pages

from micropdf import EasyPDF

text = EasyPDF.extract_text('document.pdf')
print(text)

Render All Pages to PNG

from micropdf import EasyPDF

with EasyPDF.open('document.pdf') as pdf:
    paths = pdf.render_all_pages(
        output_dir='pages',
        prefix='page',
        dpi=300
    )
    print(f"Generated {len(paths)} images")

Search and Highlight

from micropdf import EasyPDF

with EasyPDF.open('document.pdf') as pdf:
    results = pdf.search_all('Python')

    for result in results:
        page_num = result['page_num']
        bbox = result['bbox']
        print(f"Found 'Python' on page {page_num} at {bbox}")

Password-Protected PDFs

from micropdf import EasyPDF

with EasyPDF.open_with_password('secret.pdf', 'password') as pdf:
    text = pdf.extract_all_text()
    print(text)

Custom DPI Rendering

from micropdf import Context, Document, Matrix, Pixmap, Colorspace

with Context() as ctx:
    with Document.open(ctx, 'document.pdf') as doc:
        with doc.load_page(0) as page:
            # 300 DPI = 300/72 scale factor
            dpi = 300
            scale = dpi / 72.0
            matrix = Matrix.scale(scale, scale)
            colorspace = Colorspace.device_rgb(ctx)

            with Pixmap.from_page(ctx, page, matrix, colorspace) as pix:
                pix.save_png('high_res.png')

Development

Setup Development Environment

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run tests with coverage
pytest --cov=micropdf --cov-report=html

# Type checking
mypy src/micropdf

# Linting
ruff check src/micropdf

# Formatting
black src/micropdf tests

Building Documentation

cd docs
make html
open _build/html/index.html

Architecture

FFI Layer (ffi.py): Low-level cffi bindings to Rust library
Core Classes: Pythonic wrappers around FFI handles
Easy API (easy.py): High-level, simplified interface
Automatic Cleanup: Context managers for resource management

Performance

MicroPDF Python bindings provide near-native performance by:

Using cffi for efficient C interop
Minimizing Python/Rust boundary crossings
Leveraging Rust's zero-cost abstractions
Direct memory access for pixel data

Comparison with Other Libraries

Feature	MicroPDF	PyMicroPDF	pdfplumber	PyPDF2
Speed	⚡⚡⚡	⚡⚡⚡	⚡	⚡
Memory	✅ Low	✅ Low	⚠️ High	⚠️ High
Type Hints	✅ Full	⚠️ Partial	✅ Full	⚠️ Partial
Easy API	✅ Yes	❌ No	✅ Yes	❌ No
Rendering	✅ Yes	✅ Yes	❌ No	❌ No

License

Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.15.6

Mar 19, 2026

0.14.1

Mar 15, 2026

0.14.0

Mar 14, 2026

0.11.3

Mar 4, 2026

This version

0.10.0

Feb 27, 2026

0.9.2

Feb 24, 2026

0.9.1

Feb 17, 2026

0.9.0

Jan 22, 2026

0.8.0

Jan 17, 2026

0.7.0

Jan 15, 2026

0.6.0

Jan 14, 2026

0.5.0

Jan 14, 2026

0.4.0

Jan 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

micropdf-0.10.0.tar.gz (66.4 kB view details)

Uploaded Feb 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

micropdf-0.10.0-py3-none-any.whl (70.5 kB view details)

Uploaded Feb 27, 2026 Python 3

File details

Details for the file micropdf-0.10.0.tar.gz.

File metadata

Download URL: micropdf-0.10.0.tar.gz
Upload date: Feb 27, 2026
Size: 66.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for micropdf-0.10.0.tar.gz
Algorithm	Hash digest
SHA256	`9a3eed89f3695dcb58c972a1b541f59fabd3048246493503ad76af8181663352`
MD5	`049e67ae3cfbc91a2a315e381f73472a`
BLAKE2b-256	`d1012cb88e81ba80d0ebd68ee0cd3949cfdcd80371ddedd3405b9247fb7baac8`

See more details on using hashes here.

File details

Details for the file micropdf-0.10.0-py3-none-any.whl.

File metadata

Download URL: micropdf-0.10.0-py3-none-any.whl
Upload date: Feb 27, 2026
Size: 70.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for micropdf-0.10.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5a2fa63b0724309413afda2dad7347ef5dc43120c3c63507820312e7e4a80462`
MD5	`6597cc0804fbcc4d2272236acab77cbe`
BLAKE2b-256	`f65eda059cd7a15cba01c9a5d4169274f996d2649e14f4f41048dabf5e1fec9b`

See more details on using hashes here.

micropdf 0.10.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MicroPDF Python Bindings

Features

Installation

From Source

Requirements

Quick Start

Easy API (Recommended for Beginners)

Fluent API with Context Manager

Low-Level API (Advanced)

API Reference

Easy API

Static Methods

Instance Methods

Core Classes

Context

Document

Page

Pixmap

Geometry

Colorspace

Examples

Extract Text from All Pages

Render All Pages to PNG

Search and Highlight

Password-Protected PDFs

Custom DPI Rendering

Development

Setup Development Environment

Building Documentation

Architecture

Performance

Comparison with Other Libraries

License

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes