High-performance PDF text parser for Swarmauri using PyMuPDF with aggregated whole-document output.

These details have not been verified by PyPI

Project description

Swarmauri Logo

Swarmauri Parser Fitz PDF

swarmauri_parser_fitzpdf is the Swarmauri PDF parser for high-performance text extraction using PyMuPDF. It opens a PDF, extracts text from every page, and returns a single Swarmauri Document with the aggregated content and source metadata.

Why Use Swarmauri Parser Fitz PDF

Use PyMuPDF's fast document engine for PDF extraction inside Swarmauri ingestion and indexing pipelines.
Produce one normalized Document for whole-file workflows such as summarization, classification, or chunking after parse.
Keep PDF parsing logic aligned with the Swarmauri parser interface used by other loaders and processors.
Stay flexible if you later need PyMuPDF-specific extraction modes or OCR augmentation upstream.

FAQ

What does this parser return?
A list containing one Swarmauri Document whose content holds the combined extracted text for the PDF.

Does it return one document per page?
No. This parser aggregates all page text into a single document.

Can it parse scanned PDFs with no text layer?
Not by itself. PyMuPDF extracts text objects already present in the document. Scan-only PDFs should be OCR'd first.

What input type does it expect?
A file path string pointing to a local PDF.

Features

Aggregated PDF text extraction through PyMuPDF.
Preserves the original source path in document metadata.
Uses a lightweight Swarmauri parser surface for document pipelines.
Appropriate for whole-document ingestion, chunking, and retrieval setup.
Supports Python 3.10, 3.11, 3.12, 3.13, and 3.14.

Installation

uv add swarmauri_parser_fitzpdf

pip install swarmauri_parser_fitzpdf

Usage

from swarmauri_parser_fitzpdf import FitzPdfParser

parser = FitzPdfParser()
documents = parser.parse("reports/quarterly.pdf")

for document in documents:
    print(document.metadata["source"])
    print(document.content[:500])

Examples

Parse a PDF into a single document

from swarmauri_parser_fitzpdf import FitzPdfParser

parser = FitzPdfParser()
docs = parser.parse("whitepapers/roadmap.pdf")

if docs:
    print(len(docs[0].content))

Handle invalid input safely

from swarmauri_parser_fitzpdf import FitzPdfParser

parser = FitzPdfParser()

try:
    docs = parser.parse("missing.pdf")
    if not docs:
        print("Parsing failed or returned no text.")
except ValueError as exc:
    print(exc)

Related Packages

Swarmauri Foundations

Best Practices

Use this parser when you want a whole-document text payload rather than page-by-page output.
Use OCR earlier in the flow for scan-only documents that have no extractable text layer.
Cache parse output for large PDFs if the same files are processed repeatedly.
If reading order matters, verify the extracted output on representative documents because PDF text order depends on document structure.

License

This project is licensed under the Apache-2.0 License.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.11.0.dev1 pre-release

Jun 30, 2026

0.8.4.dev3 pre-release

May 20, 2026

0.8.4.dev2 pre-release

May 20, 2026

0.8.3

Mar 24, 2026

0.8.3.dev24 pre-release

Mar 23, 2026

0.8.3.dev22 pre-release

Mar 20, 2026

0.8.3.dev21 pre-release

Mar 20, 2026

0.8.3.dev20 pre-release

Mar 20, 2026

0.8.3.dev19 pre-release

Mar 20, 2026

0.8.3.dev18 pre-release

Mar 20, 2026

0.8.3.dev17 pre-release

Mar 20, 2026

0.8.3.dev10 pre-release

Feb 23, 2026

0.8.3.dev5 pre-release

Feb 18, 2026

0.8.3.dev4 pre-release

Feb 17, 2026

0.8.3.dev3 pre-release

Feb 17, 2026

0.8.2

Feb 17, 2026

0.8.2.dev7 pre-release

Feb 17, 2026

0.8.2.dev6 pre-release

Feb 12, 2026

0.8.0

Jan 28, 2026

0.8.0.dev22 pre-release

Jan 27, 2026

0.8.0.dev4 pre-release

Sep 11, 2025

0.8.0.dev3 pre-release

Sep 10, 2025

0.8.0.dev2 pre-release

Sep 10, 2025

0.7.5

May 23, 2025

0.7.5.dev1 pre-release

May 23, 2025

0.7.4

May 23, 2025

0.7.4.dev20 pre-release

May 23, 2025

0.7.3

Mar 31, 2025

0.7.3.dev2 pre-release

Mar 31, 2025

0.7.2

Mar 6, 2025

0.7.2.dev3 pre-release

Mar 6, 2025

0.7.2.dev2 pre-release

Mar 6, 2025

0.7.2.dev1 pre-release

Mar 6, 2025

0.7.1

Mar 6, 2025

0.7.1.dev1 pre-release

Mar 5, 2025

0.7.0

Mar 4, 2025

0.7.0.dev12 pre-release

Mar 4, 2025

0.7.0.dev11 pre-release

Mar 4, 2025

0.7.0.dev10 pre-release

Mar 4, 2025

0.7.0.dev9 pre-release

Mar 4, 2025

0.7.0.dev8 pre-release

Mar 4, 2025

0.7.0.dev7 pre-release

Mar 4, 2025

0.7.0.dev6 pre-release

Mar 4, 2025

0.7.0.dev5 pre-release

Mar 4, 2025

0.7.0.dev4 pre-release

Mar 4, 2025

0.7.0.dev3 pre-release

Mar 4, 2025

0.7.0.dev2 pre-release

Mar 3, 2025

0.6.1

Feb 19, 2025

0.6.1.dev16 pre-release

Feb 19, 2025

0.6.0.dev154 pre-release

Jan 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swarmauri_parser_fitzpdf-0.11.0.dev1.tar.gz (8.2 kB view details)

Uploaded Jun 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

swarmauri_parser_fitzpdf-0.11.0.dev1-py3-none-any.whl (9.1 kB view details)

Uploaded Jun 30, 2026 Python 3

File details

Details for the file swarmauri_parser_fitzpdf-0.11.0.dev1.tar.gz.

File metadata

Download URL: swarmauri_parser_fitzpdf-0.11.0.dev1.tar.gz
Upload date: Jun 30, 2026
Size: 8.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_parser_fitzpdf-0.11.0.dev1.tar.gz
Algorithm	Hash digest
SHA256	`70acf139f20bc5312846d241267667836025c7d1fdb25a9f1e0e78b21c0eb608`
MD5	`8a8bd5ae79ec0e53cb1a6e6fe5c72663`
BLAKE2b-256	`86467d5237bfad7ebcb1eb0a0fa266b0eac50da29852609f355cae2776cd8d53`

See more details on using hashes here.

File details

Details for the file swarmauri_parser_fitzpdf-0.11.0.dev1-py3-none-any.whl.

File metadata

Download URL: swarmauri_parser_fitzpdf-0.11.0.dev1-py3-none-any.whl
Upload date: Jun 30, 2026
Size: 9.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_parser_fitzpdf-0.11.0.dev1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c79423454b8bfc5ee0ea9c2c7397505f67b72c8b4be2519279c4f190614d6933`
MD5	`d4d3a918c2b282b435c1fc65e5e6c124`
BLAKE2b-256	`dd0588786d654b76facd241edfa146de90a2f271ded1c0ecbe0d4aba78059c2c`

See more details on using hashes here.

swarmauri_parser_fitzpdf 0.11.0.dev1

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

Swarmauri Parser Fitz PDF

Why Use Swarmauri Parser Fitz PDF

FAQ

Features

Installation

Usage

Examples

Parse a PDF into a single document

Handle invalid input safely

Related Packages

Swarmauri Foundations

More Documentation

Best Practices

License

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes