Document utilities for SRX: extract text from PDF, DOCX, PPTX, XLSX, and audio files (MP3, M4A, WAV)
Project description
srx-lib-docs
Small helpers to extract plain text from common office document formats used by SRX services.
What it includes:
extract_text(path_or_bytes, mime_type=None)supports PDF, DOCX, PPTX, XLSXDocumentMarkdownConverterto download and convert PDF/DOCX/PPTX/XLSX to Markdown
Install
PyPI (public):
pip install srx-lib-docs
uv (pyproject):
[project]
dependencies = ["srx-lib-docs>=0.1.0"]
Usage
from srx_lib_docs import extract_text
text = extract_text("/path/to/file.pdf")
Markdown conversion with download:
from srx_lib_docs.markdown import DocumentMarkdownConverter
conv = DocumentMarkdownConverter()
result = await conv.process_document(url, mimetype="application/pdf")
print(result["markdown_content"]) # plus file_type, file_size, success
Notes
- For XLSX, the first 20 rows of each sheet are read to keep it lightweight; adjust in code if needed.
License
Proprietary © SRX
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file srx_lib_docs-0.1.7.tar.gz.
File metadata
- Download URL: srx_lib_docs-0.1.7.tar.gz
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5b33a6af600fb98f932eb1e5cb0b2a38f096378705db26b0bc8bf547d2cdb4d
|
|
| MD5 |
ee2d730dc780c018bc2d1a75abc96d0d
|
|
| BLAKE2b-256 |
d0c5a74b29e3c046cd1f44df2794ec3f42d263ac4d01df9e8ce07d740b393844
|
File details
Details for the file srx_lib_docs-0.1.7-py3-none-any.whl.
File metadata
- Download URL: srx_lib_docs-0.1.7-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
832583a9e263de48b4c47a13d2979d4bc4280705d9526fefe2b9dc59f5ceb8fe
|
|
| MD5 |
14bb95354d8327064bcc96be5d71433f
|
|
| BLAKE2b-256 |
6c394c3b818e5fd854d023c86485abf0b628b1c0f4f9550b7826f1dc2cbadea1
|