A more intuitive interface for working with PDFs

These details have not been verified by PyPI

Project links

Project description

Natural PDF

A friendly library for working with PDFs, built on top of pdfplumber.

Natural PDF lets you find and extract content from PDFs using simple code that makes sense.

Installation

pip install natural-pdf

Need OCR, layout models, or other add-ons? Install what you need:

pip install easyocr                 # EasyOCR engine
pip install "natural-pdf[paddle]"   # PaddleOCR stack
pip install "surya-ocr<0.15"         # Surya OCR engine
pip install doclayout_yolo          # YOLO layout detection
pip install "natural-pdf[export]"   # PDF export, deskew
pip install "natural-pdf[all]"      # Everything

More details in the installation guide.

Quick Start

from natural_pdf import PDF

# Open a PDF
pdf = PDF('https://github.com/jsoma/natural-pdf/raw/refs/heads/main/pdfs/01-practice.pdf')
page = pdf.pages[0]

# Extract all of the text on the page
page.extract_text()

# Find elements using CSS-like selectors
heading = page.find('text:contains("Summary"):bold')

# Extract content below the heading
content = heading.below().extract_text()

# Examine all the bold text on the page
page.find_all('text:bold').show()

# Exclude parts of the page from selectors/extractors
header = page.find('text:contains("CONFIDENTIAL")').above()
footer = page.find_all('line')[-1].below()
page.add_exclusion(header)
page.add_exclusion(footer)

# Extract clean text from the page ignoring exclusions
clean_text = page.extract_text()

And as a fun bonus, page.viewer() will provide an interactive method to explore the PDF.

Key Features

Natural PDF offers a range of features for working with PDFs:

CSS-like Selectors: Find elements using intuitive query strings (page.find('text:bold')).
Spatial Navigation: Select content relative to other elements (heading.below(), element.select_until(...)).
Text & Table Extraction: Get clean text or structured table data, automatically handling exclusions.
OCR Integration: Extract text from scanned documents using engines like EasyOCR, PaddleOCR, or Surya.
Layout Analysis: Detect document structures (titles, paragraphs, tables) using various engines (e.g., YOLO, Paddle, LLM via API).
Document QA: Ask natural language questions about your document's content.
Semantic Search: Index PDFs and find relevant pages or documents based on semantic meaning using Haystack.
Visual Debugging: Highlight elements and use an interactive viewer or save images to understand your selections.

Learn More

Dive deeper into the features and explore advanced usage in the Complete Documentation.

Extending Natural PDF

Natural PDF now exposes its pluggable engines through small helper functions so you rarely have to touch the core registry directly. Two handy entry points:

from natural_pdf.tables import register_table_function

def table_delim(region, *, context=None, **kwargs):
    # return a TableResult or list-of-lists
    ...

register_table_function("table_delim", table_delim)

from natural_pdf.selectors import register_selector_engine

class DebugSelectorEngine:
    def query(self, *, context, selector, options):
        ...

register_selector_engine("debug", lambda **_: DebugSelectorEngine())

Best friends

Natural PDF sits on top of a lot of fantastic tools and mdoels, some of which are:

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.1

Apr 9, 2026

This version

0.6.0

Apr 6, 2026

0.5.4

Mar 18, 2026

0.5.3

Mar 16, 2026

0.5.2

Mar 6, 2026

0.5.1

Mar 4, 2026

0.5.0

Mar 1, 2026

0.4.1

Feb 20, 2026

0.4.0

Feb 18, 2026

0.3.3

Feb 18, 2026

0.3.2

Dec 11, 2025

0.3.1

Nov 22, 2025

0.3.0

Nov 22, 2025

0.2.22

Sep 10, 2025

0.2.21

Sep 10, 2025

0.2.20

Sep 8, 2025

0.2.19

Sep 2, 2025

0.2.18

Sep 1, 2025

0.2.17

Sep 1, 2025

0.2.16

Aug 27, 2025

0.2.15

Aug 25, 2025

0.2.13

Aug 24, 2025

0.2.12

Aug 13, 2025

0.2.11

Aug 5, 2025

0.2.10

Aug 4, 2025

0.2.9

Aug 4, 2025

0.2.8

Aug 4, 2025

0.2.6

Aug 4, 2025

0.2.5

Aug 2, 2025

0.2.4

Aug 1, 2025

0.2.3

Jul 22, 2025

0.2.2

Jul 21, 2025

0.2.1.dev0 pre-release

Jul 13, 2025

0.2.0

Jul 13, 2025

0.1.40

Jun 30, 2025

0.1.38

Jun 28, 2025

0.1.37

Jun 28, 2025

0.1.36

Jun 27, 2025

0.1.35

Jun 27, 2025

0.1.34

Jun 27, 2025

0.1.33

Jun 26, 2025

0.1.32

Jun 26, 2025

0.1.31

Jun 24, 2025

0.1.30

Jun 24, 2025

0.1.28

Jun 21, 2025

0.1.27

Jun 21, 2025

0.1.26.dev0 pre-release

Jun 21, 2025

0.1.24

Jun 19, 2025

0.1.23

Jun 18, 2025

0.1.22

Jun 16, 2025

0.1.21

Jun 16, 2025

0.1.20

Jun 16, 2025

0.1.19

Jun 16, 2025

0.1.18

Jun 16, 2025

0.1.17

Jun 15, 2025

0.1.16

Jun 15, 2025

0.1.15

Jun 10, 2025

0.1.14

Jun 6, 2025

0.1.13

Jun 5, 2025

0.1.12

May 15, 2025

0.1.11

May 6, 2025

0.1.10

May 2, 2025

0.1.9

Apr 30, 2025

0.1.8

Apr 27, 2025

0.1.7

Apr 23, 2025

0.1.6

Apr 21, 2025

0.1.5

Apr 16, 2025

0.1.4

Apr 13, 2025

0.1.3

Apr 12, 2025

0.1.2

Apr 6, 2025

0.1.1

Apr 3, 2025

0.1.0

Apr 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

natural_pdf-0.6.0.tar.gz (2.7 MB view details)

Uploaded Apr 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

natural_pdf-0.6.0-py3-none-any.whl (1.1 MB view details)

Uploaded Apr 6, 2026 Python 3

File details

Details for the file natural_pdf-0.6.0.tar.gz.

File metadata

Download URL: natural_pdf-0.6.0.tar.gz
Upload date: Apr 6, 2026
Size: 2.7 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for natural_pdf-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`7cf0e8e150c34cc7ac2fdf2c66f1a01ac1d0b8f1681c4f9d491be826299e94a8`
MD5	`686dec5676f0115d2cbabfb4b9e0a543`
BLAKE2b-256	`52ea6efccd4b4acd2a25e252b6e25b11321505a9ca76147069abe7844f083e99`

See more details on using hashes here.

File details

Details for the file natural_pdf-0.6.0-py3-none-any.whl.

File metadata

Download URL: natural_pdf-0.6.0-py3-none-any.whl
Upload date: Apr 6, 2026
Size: 1.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.11

File hashes

Hashes for natural_pdf-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e440cc6827d075866d0a045892124c6d19dcfee5ab07ce1863527992e598871f`
MD5	`5f9b378078f8a6c66bd33979ddfad73b`
BLAKE2b-256	`be9b2f2945ea6c820b04380593f3282574be4b4b036f7915b528344d540925f9`

See more details on using hashes here.

natural-pdf 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Natural PDF

Installation

Quick Start

Key Features

Learn More

Extending Natural PDF

Best friends

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes