Skip to main content

Document utilities for SRX: extract text from PDF, DOCX, PPTX, XLSX

Project description

srx-lib-docs

Small helpers to extract plain text from common office document formats used by SRX services.

What it includes:

  • extract_text(path_or_bytes, mime_type=None) supports PDF, DOCX, PPTX, XLSX

Install

PyPI (public):

  • pip install srx-lib-docs

uv (pyproject):

[project]
dependencies = ["srx-lib-docs>=0.1.0"]

Usage

from srx_lib_docs import extract_text
text = extract_text("/path/to/file.pdf")

Notes

  • For XLSX, the first 20 rows of each sheet are read to keep it lightweight; adjust in code if needed.

License

Proprietary © SRX

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

srx_lib_docs-0.1.4.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

srx_lib_docs-0.1.4-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file srx_lib_docs-0.1.4.tar.gz.

File metadata

  • Download URL: srx_lib_docs-0.1.4.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for srx_lib_docs-0.1.4.tar.gz
Algorithm Hash digest
SHA256 b23225e2a69579939168046327cd6c721995b556069ccf8a9bb15f4d936b9eba
MD5 2649f141d130aeaa8c2bc1612f7b42e8
BLAKE2b-256 8e34d5bf222c35f8477f100e40ed5c88ed676cd18628d04905f0f9f778ac4372

See more details on using hashes here.

File details

Details for the file srx_lib_docs-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: srx_lib_docs-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for srx_lib_docs-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 961e12d03bf9ef850c3d37086d1b38c743356415bb2b77fb89f125dd8dc598f7
MD5 d042686ad47f68482d28e6a942cebd20
BLAKE2b-256 7ab5f7f54a3bb4822e0dfec90a4c12670d8224c6179eadd7bddbad0c054a209a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page