Page-by-page PDF text parser for Swarmauri using slate3k over local file-path inputs.
Project description
Swarmauri Parser Slate
swarmauri_parser_slate is the Swarmauri PDF parser for page-by-page text
extraction using slate3k, a lightweight
wrapper around PDFMiner. It reads a local PDF path, extracts text for each
page, and returns Swarmauri Document objects with source and page metadata.
Why Use Swarmauri Parser Slate
- Parse text-based PDFs into page-scoped
Documentobjects for chunking, retrieval, and downstream agent workflows. - Keep document ingestion aligned with the Swarmauri parser interface.
- Use a small PDF extraction dependency when
slate3kis sufficient for the target document set. - Preserve page numbers so later indexing, annotation, or citation workflows can map text back to the source file.
FAQ
What input does this parser accept?
A local PDF file path as a string.
Does it support raw PDF bytes?
No. The current implementation is path-only and raisesTypeErrorfor other input types.
What does it return?
A list of SwarmauriDocumentobjects, usually one per extracted page.
Does it perform OCR on scanned PDFs?
No. It is intended for PDFs that already contain extractable text.
Features
- Page-by-page PDF text extraction through
slate3k. - Returns
Documentobjects withpage_numberandsourcemetadata. - Provides a clear
TypeErrorfor unsupported input types. - Fits Swarmauri ingestion, parsing, and retrieval pipelines.
- Supports Python 3.10, 3.11, 3.12, 3.13, and 3.14.
Installation
uv add swarmauri_parser_slate
pip install swarmauri_parser_slate
Usage
from swarmauri_parser_slate import SlateParser
parser = SlateParser()
documents = parser.parse("pdfs/handbook.pdf")
for document in documents:
print(document.metadata["page_number"], document.content[:120])
Examples
Parse a handbook PDF
from swarmauri_parser_slate import SlateParser
parser = SlateParser()
pages = parser.parse("manuals/employee-handbook.pdf")
for page in pages:
print(page.metadata["page_number"], len(page.content))
Handle missing files and invalid inputs
from swarmauri_parser_slate import SlateParser
parser = SlateParser()
print(parser.parse("missing.pdf"))
try:
parser.parse(b"%PDF-1.7 ...")
except TypeError as exc:
print(exc)
Related Packages
Swarmauri Foundations
More Documentation
Best Practices
- Use this parser for PDFs that already contain selectable text.
- Route scan-only or image-based PDFs through OCR before parsing.
- Keep page-granular output when later stages need per-page provenance.
- Validate representative PDFs first because extraction quality depends on the original PDF structure.
License
This project is licensed under the Apache-2.0 License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file swarmauri_parser_slate-0.11.0.dev1.tar.gz.
File metadata
- Download URL: swarmauri_parser_slate-0.11.0.dev1.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3d998d024053a342acde2fb7ac9275e1a58b95c80edd38f60d284b1fc04bd35
|
|
| MD5 |
9d58f063cdae52e3be950e5131317099
|
|
| BLAKE2b-256 |
3d8b2b42466196b77c3f61baec3cbe0961f4130ed61cd7c2ee5dc926f8cac851
|
File details
Details for the file swarmauri_parser_slate-0.11.0.dev1-py3-none-any.whl.
File metadata
- Download URL: swarmauri_parser_slate-0.11.0.dev1-py3-none-any.whl
- Upload date:
- Size: 8.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.26 {"installer":{"name":"uv","version":"0.11.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60fead509bd2927639dc606755ef92c54d242fb0d5f2356ba52684c4038c5287
|
|
| MD5 |
9971fe92255e01dc23bc7aba03486e64
|
|
| BLAKE2b-256 |
ea17817257d5d87cb3e135be780df1610082888bf05b856d2a29199a32fd6b4c
|