LLM-only, agentic layout & text extraction to Markdown/Text/Layout JSON

These details have not been verified by PyPI

Project links

Project description

LayoutScribe

LLM-powered layout & text extraction for PDFs, slides, and Word docs

LLM-only, agentic parser that converts PDF / PPTX / DOCX into clean Markdown, plain text, and layout JSON (with normalized bounding boxes).
Built with LangGraph (agent orchestration), LiteLLM (provider-agnostic multimodal calls), and MLflow (tracing).

No OCR engines, no heuristic parsers. Rendering to images is allowed; all structure and text understanding is done by a multimodal LLM.

Features (0.1)

Inputs: PDF, PPTX, DOCX (rendered pages/slides as images)
Outputs:
- Markdown (headings, lists, tables, captions)
- Plain text
- Layout JSON (blocks with type, bbox[0..1], text, conf)
Agentic pipeline: planner → page_vision (async) → reviewer (validate/re-ask) → composer
Robustness:
- Re-ask on schema/geometry violations (IoU/coverage checks)
- Fallback injection when LLM returns empty content so Markdown is never blank
Provider-agnostic via LiteLLM (OpenAI, Azure OpenAI, Claude, Gemini)
MLflow tracing for params, metrics, artifacts

Status

0.1 (alpha) released — see CHANGELOG.md and docs/ROADMAP.md.

Quick Links

docs/ARCHITECTURE.md – modules & flow
docs/PROMPTS_AND_SCHEMA.md – prompt rules and schema notes
docs/schema/layout_page.schema.json – formal JSON Schema (Draft 2020-12)
docs/CONFIGURATION.md – env vars, provider-specific setup, .env example
docs/API_SPEC.md / docs/CLI_SPEC.md – contracts & examples
docs/BENCHMARKS.md – datasets & metrics
docs/TESTING_STRATEGY.md – testing plan & commands
docs/PROVIDERS.md – model matrix & concurrency guidance
docs/SECURITY.md – keys, artifacts, and vulnerability reporting
docs/ROADMAP.md – milestones
CONTRIBUTING.md – how to help
CHANGELOG.md – notable changes

Installation

Requires Python 3.10+.

pip install layoutscribe

Optional extras:

# Office file support (PPTX/DOCX rendering via python-pptx / python-docx)
pip install "layoutscribe[office]"

# Development tools (ruff, black, pytest)
pip install "layoutscribe[dev]"

Runtime notes:

PDF rendering: PyMuPDF (included)
PPTX/DOCX support: python-pptx, python-docx (install with [office])

Getting Started

Set provider keys as environment variables (see CONFIGURATION.md). Example .env:

OPENAI_API_KEY=sk-...
LAYOUTSCRIBE_DPI=180

Quickstart

CLI

layoutscribe parse ./samples/report.pdf \
  --llm openai/gpt-4o \
  --outputs markdown text layout_json \
  --output-dir ./artifacts/report \
  --dpi 180 --parallel-pages 6 --budget-usd 0.50

Python API

import asyncio
from layoutscribe.api import parse as ls_parse


async def main() -> None:
  doc = await ls_parse(
    path="samples/report.pdf",
    outputs=["markdown", "text", "layout_json"],
    llm="openai/gpt-4o",
    dpi=180,
    parallel_pages=6,
    budget_usd=0.50,
    save_intermediate=True,
  )
  print(doc.metadata)
  print(doc.markdown[:1000])


if __name__ == "__main__":
  asyncio.run(main())

Outputs & Artifacts

./artifacts/report/
  document.md
  document.txt
  layout.json
  overlays/
    page-0001.png
    page-0002.png
  intermediate/
    page-0001.json

Configuration

See docs/CONFIGURATION.md for provider-specific env vars, defaults, and precedence. MLflow tracing is opt-in via --trace-mlflow.

LiteLLM provider setup

LiteLLM reads provider keys from environment variables. Set only those you need:

# OpenAI
OPENAI_API_KEY=sk-...

# Azure OpenAI
AZURE_OPENAI_API_KEY=...  
AZURE_OPENAI_ENDPOINT=https://<your-resource>.openai.azure.com/  
AZURE_OPENAI_API_VERSION=2024-02-15-preview

# Anthropic
ANTHROPIC_API_KEY=...

# Google (Gemini)
GOOGLE_API_KEY=...

Use --llm to pick a model via LiteLLM:

--llm openai/gpt-4o
--llm azure/<deployment_name>
--llm anthropic/claude-3.5-sonnet
--llm google/gemini-1.5-pro

Notes:

For Azure, ensure the deployment name references a vision-capable model and that your endpoint/API version are set.
Keep temperature low (0–0.2) for consistent JSON.
Respect provider rate limits; we use retries with exponential backoff.

Limitations (0.1)

No OCR engines; relies entirely on a multimodal LLM
Basic tables only (CSV-like); no complex rowspan/colspan recovery
No handwriting support; language translation out of scope
Confidence scores (if present) are heuristic and not calibrated

Community & Support

Open issues and discussions on GitHub
For security concerns, follow SECURITY.md (use private advisories)

License

Apache-2.0 (see LICENSE).

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0a3 pre-release

Nov 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

layoutscribe-0.1.0a3.tar.gz (29.1 kB view details)

Uploaded Nov 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

layoutscribe-0.1.0a3-py3-none-any.whl (35.1 kB view details)

Uploaded Nov 2, 2025 Python 3

File details

Details for the file layoutscribe-0.1.0a3.tar.gz.

File metadata

Download URL: layoutscribe-0.1.0a3.tar.gz
Upload date: Nov 2, 2025
Size: 29.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for layoutscribe-0.1.0a3.tar.gz
Algorithm	Hash digest
SHA256	`4296d797e6a4b7518f6657219980ee2dc7779b659990e8b8c6836735a83ce6e9`
MD5	`8317c004e57f5c0fc20dd062a074fa07`
BLAKE2b-256	`f558a1cf32509f0fb0d1717fa3d31d2a128eb2d253f734c244f7a235f61a9fca`

See more details on using hashes here.

File details

Details for the file layoutscribe-0.1.0a3-py3-none-any.whl.

File metadata

Download URL: layoutscribe-0.1.0a3-py3-none-any.whl
Upload date: Nov 2, 2025
Size: 35.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for layoutscribe-0.1.0a3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`495b898027410c6a1e708a04ba418b7249f4d56d1add4d45de4c5fa84101f725`
MD5	`2d9b0cea9e2f4fbaf46d5fcda6b355de`
BLAKE2b-256	`afb0c48d3766cff7ab7e9a3b1a4005c41a04c74c789d431a72a4e4623a1d0e16`

See more details on using hashes here.

layoutscribe 0.1.0a3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LayoutScribe

Features (0.1)

Status

Quick Links

Installation

Getting Started

Quickstart

CLI

Python API

Outputs & Artifacts

Configuration

LiteLLM provider setup

Limitations (0.1)

Community & Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes