Skip to main content

Layout-aware document parser for structured LLM-ready JSON

Project description

DocuWeave

Layout-aware document parser that converts PDFs into structured, hierarchical, LLM-ready JSON.

Features

  • Deterministic layout-based parsing
  • Hierarchical section detection
  • Token-aware smart chunking
  • Embedding-ready JSON export
  • RAG pipeline optimized

Installation

pip install docuweave

from docuweave import parse

doc = parse("sample.pdf")

doc.to_chunks(max_tokens=500)
doc.save_json("output.json")

{
  "metadata": {...},
  "sections": [...],
  "chunks": [...]
}



We’ll improve this later.

---

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docuweave-0.1.0.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docuweave-0.1.0-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file docuweave-0.1.0.tar.gz.

File metadata

  • Download URL: docuweave-0.1.0.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for docuweave-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4ebcadc1d8bc9e45021257301ecd56e786f58705b7f2f0e3a735374839755d7b
MD5 e0acbc9e41c46d4a44d8533ab52ae156
BLAKE2b-256 eaed1d8c8f0c0dd420c5005070557c35c552f816fec3029bdb72103ed50c83f9

See more details on using hashes here.

File details

Details for the file docuweave-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: docuweave-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for docuweave-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5a68efee1b0dceaa0b0b17f70a42845e96d15a12e4f20c4e364e44052c188dea
MD5 14e6c7a2f7e1e4bf791958a537b3a2f3
BLAKE2b-256 f7dd8eb1bbaffc9b874228554e4f5ecc8040480bbe3d0ec25223abc4ddb4b3b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page