Skip to main content

Document utilities for SRX: extract text from PDF, DOCX, PPTX, XLSX

Project description

srx-lib-docs

Small helpers to extract plain text from common office document formats used by SRX services.

What it includes:

  • extract_text(path_or_bytes, mime_type=None) supports PDF, DOCX, PPTX, XLSX

Install

PyPI (public):

  • pip install srx-lib-docs

uv (pyproject):

[project]
dependencies = ["srx-lib-docs>=0.1.0"]

Usage

from srx_lib_docs import extract_text
text = extract_text("/path/to/file.pdf")

Notes

  • For XLSX, the first 20 rows of each sheet are read to keep it lightweight; adjust in code if needed.

License

Proprietary © SRX

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

srx_lib_docs-0.1.2.tar.gz (2.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

srx_lib_docs-0.1.2-py3-none-any.whl (2.7 kB view details)

Uploaded Python 3

File details

Details for the file srx_lib_docs-0.1.2.tar.gz.

File metadata

  • Download URL: srx_lib_docs-0.1.2.tar.gz
  • Upload date:
  • Size: 2.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for srx_lib_docs-0.1.2.tar.gz
Algorithm Hash digest
SHA256 298fb875694ac3a9a41d1c299abfde6cb1cf20c6d2fe3f2bcc251b319472d61a
MD5 fdd5848bc21f264faf708849108dc8ef
BLAKE2b-256 d92f1032cb6c6b6716d03e0f83962787b21656524220a51720059c0f3bfcbdeb

See more details on using hashes here.

File details

Details for the file srx_lib_docs-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: srx_lib_docs-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 2.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for srx_lib_docs-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 48d7c8d4cb2e222b2a28b32e477f234ff554e40e21dcbd498ee20d41adf54c4c
MD5 f515c2d6cd1600d603e881d5159968b2
BLAKE2b-256 a8dbac52ced8382e8b388ae57b180e5c6c3af9b248be14b83f590ff0d06fc80a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page