Layout-aware document parser for structured LLM-ready JSON
Project description
DocuWeave
Layout-aware document parser that converts PDFs into structured, hierarchical, LLM-ready JSON.
Features
- Deterministic layout-based parsing
- Hierarchical section detection
- Token-aware smart chunking
- Embedding-ready JSON export
- RAG pipeline optimized
Installation
pip install docuweave
from docuweave import parse
doc = parse("sample.pdf")
doc.to_chunks(max_tokens=500)
doc.save_json("output.json")
{
"metadata": {...},
"sections": [...],
"chunks": [...]
}
We’ll improve this later.
---
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
docuweave-0.1.0.tar.gz
(7.9 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docuweave-0.1.0.tar.gz.
File metadata
- Download URL: docuweave-0.1.0.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ebcadc1d8bc9e45021257301ecd56e786f58705b7f2f0e3a735374839755d7b
|
|
| MD5 |
e0acbc9e41c46d4a44d8533ab52ae156
|
|
| BLAKE2b-256 |
eaed1d8c8f0c0dd420c5005070557c35c552f816fec3029bdb72103ed50c83f9
|
File details
Details for the file docuweave-0.1.0-py3-none-any.whl.
File metadata
- Download URL: docuweave-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a68efee1b0dceaa0b0b17f70a42845e96d15a12e4f20c4e364e44052c188dea
|
|
| MD5 |
14e6c7a2f7e1e4bf791958a537b3a2f3
|
|
| BLAKE2b-256 |
f7dd8eb1bbaffc9b874228554e4f5ecc8040480bbe3d0ec25223abc4ddb4b3b3
|