Skip to main content

A parser for extracting text from PDFs using Slate.

Project description

Swarmauri Logo

PyPI - Downloads Hits PyPI - Python Version PyPI - License PyPI - swarmauri_parser_slate

---

Swarmauri Parser Slate

PDF text parser for Swarmauri using Slate3k (a lightweight PDFMiner wrapper). Extracts text from each PDF page and returns Document instances with page metadata.

Features

  • Opens PDFs with Slate3k and returns a Document per page (content = text, metadata includes page_number and source).
  • Accepts file paths (string). Raises a TypeError when given anything else to prevent silent failures.
  • Returns an empty list if Slate encounters parsing errors, logging the exception to stdout.

Prerequisites

  • Python 3.10 or newer.
  • Slate3k depends on pdfminer.six; make sure operating-system libraries required by PDFMiner (e.g., libxml2, libxslt on Linux) are installed.
  • Read access to the PDF path you pass in.

Installation

# pip
pip install swarmauri_parser_slate

# poetry
poetry add swarmauri_parser_slate

# uv (pyproject-based projects)
uv add swarmauri_parser_slate

Quickstart

from swarmauri_parser_slate import SlateParser

parser = SlateParser()
documents = parser.parse("pdfs/handbook.pdf")

for doc in documents:
    print(doc.metadata["page_number"], doc.content[:120])

Handling Errors

parser = SlateParser()
try:
    docs = parser.parse("missing.pdf")
    if not docs:
        print("No pages parsed or Slate returned no text.")
except TypeError as exc:
    print(f"Bad input: {exc}")

Tips

  • Slate3k works best on text-based PDFs. For scanned/bitmap PDFs, run OCR first (e.g., swarmauri_ocr_pytesseract).
  • Large PDFs can consume memory; consider chunking results or streaming pages to downstream processors.
  • Combine with token counting or summarization measurements in Swarmauri to further process the extracted content.

Want to help?

If you want to contribute to swarmauri-sdk, read up on our guidelines for contributing that will help you get started.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swarmauri_parser_slate-0.3.1.dev2.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swarmauri_parser_slate-0.3.1.dev2-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file swarmauri_parser_slate-0.3.1.dev2.tar.gz.

File metadata

  • Download URL: swarmauri_parser_slate-0.3.1.dev2.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_parser_slate-0.3.1.dev2.tar.gz
Algorithm Hash digest
SHA256 082ccf2afb4940fcaef559e350fe5b9ce443aedb010335d6a657b8d81e15084e
MD5 8e17323c7d624f476ba64e851c022601
BLAKE2b-256 c52d256b63a485b94c78b13a67bc78e8710714952b7c5ebb61a172d8694f1bf6

See more details on using hashes here.

File details

Details for the file swarmauri_parser_slate-0.3.1.dev2-py3-none-any.whl.

File metadata

  • Download URL: swarmauri_parser_slate-0.3.1.dev2-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for swarmauri_parser_slate-0.3.1.dev2-py3-none-any.whl
Algorithm Hash digest
SHA256 7643a2bd2df3e7f9e7376c1a8197813a692e019195393643748ad57bc28b9a50
MD5 e505489bc9fad6b6e12b53fcb5941285
BLAKE2b-256 51d5b692f2f5bb11025d5ac9f95140c2b96fa2d99ab8db7ef75c791f1f98c158

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page