Document utilities for SRX: extract text from PDF, DOCX, PPTX, XLSX
Project description
srx-lib-docs
Small helpers to extract plain text from common office document formats used by SRX services.
What it includes:
extract_text(path_or_bytes, mime_type=None)supports PDF, DOCX, PPTX, XLSXDocumentMarkdownConverterto download and convert PDF/DOCX/PPTX/XLSX to Markdown
Install
PyPI (public):
pip install srx-lib-docs
uv (pyproject):
[project]
dependencies = ["srx-lib-docs>=0.1.0"]
Usage
from srx_lib_docs import extract_text
text = extract_text("/path/to/file.pdf")
Markdown conversion with download:
from srx_lib_docs.markdown import DocumentMarkdownConverter
conv = DocumentMarkdownConverter()
result = await conv.process_document(url, mimetype="application/pdf")
print(result["markdown_content"]) # plus file_type, file_size, success
Notes
- For XLSX, the first 20 rows of each sheet are read to keep it lightweight; adjust in code if needed.
License
Proprietary © SRX
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
srx_lib_docs-0.1.6.tar.gz
(5.4 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file srx_lib_docs-0.1.6.tar.gz.
File metadata
- Download URL: srx_lib_docs-0.1.6.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c39977d225a2fd5c38fc6321d03c864b673024b209e9aad00b6ca3f2ac89cc0
|
|
| MD5 |
e6cb364c3ac91afb972e3c69571a1ad4
|
|
| BLAKE2b-256 |
eddfc47415bcc5e0aa2260b184ce6d571bd1548b02630decc5c97fee88e94307
|
File details
Details for the file srx_lib_docs-0.1.6-py3-none-any.whl.
File metadata
- Download URL: srx_lib_docs-0.1.6-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e5efd519b4443f0750a144c3a09412812150b8a09b667dcd4ca68065672e35d
|
|
| MD5 |
b32cba2eabcac3a8d7a5e65aeaf7f08e
|
|
| BLAKE2b-256 |
68be52689799df34767581577c1f5947cc7f878d7e2f50be057a0a468e938216
|