Skip to main content

LLM-only, agentic layout & text extraction to Markdown/Text/Layout JSON

Project description

LayoutScribe

LLM-powered layout & text extraction for PDFs, slides, and Word docs

PyPI Version Python Versions License

LLM-only, agentic parser that converts PDF / PPTX / DOCX into clean Markdown, plain text, and layout JSON (with normalized bounding boxes).
Built with LangGraph (agent orchestration), LiteLLM (provider-agnostic multimodal calls), and MLflow (tracing).

No OCR engines, no heuristic parsers. Rendering to images is allowed; all structure and text understanding is done by a multimodal LLM.

Features (0.1)

  • Inputs: PDF, PPTX, DOCX (rendered pages/slides as images)
  • Outputs:
    • Markdown (headings, lists, tables, captions)
    • Plain text
    • Layout JSON (blocks with type, bbox[0..1], text, conf)
  • Agentic pipeline: planner → page_vision (async) → reviewer (validate/re-ask) → composer
  • Robustness:
    • Re-ask on schema/geometry violations (IoU/coverage checks)
    • Fallback injection when LLM returns empty content so Markdown is never blank
  • Provider-agnostic via LiteLLM (OpenAI, Azure OpenAI, Claude, Gemini)
  • MLflow tracing for params, metrics, artifacts

Status

0.1 (alpha) released — see CHANGELOG.md and docs/ROADMAP.md.

Quick Links

Installation

Requires Python 3.10+.

pip install layoutscribe

Optional extras:

# Office file support (PPTX/DOCX rendering via python-pptx / python-docx)
pip install "layoutscribe[office]"

# Development tools (ruff, black, pytest)
pip install "layoutscribe[dev]"

Runtime notes:

  • PDF rendering: PyMuPDF (included)
  • PPTX/DOCX support: python-pptx, python-docx (install with [office])

Getting Started

Set provider keys as environment variables (see CONFIGURATION.md). Example .env:

OPENAI_API_KEY=sk-...
LAYOUTSCRIBE_DPI=180

Quickstart

CLI

layoutscribe parse ./samples/report.pdf \
  --llm openai/gpt-4o \
  --outputs markdown text layout_json \
  --output-dir ./artifacts/report \
  --dpi 180 --parallel-pages 6 --budget-usd 0.50

Python API

import asyncio
from layoutscribe.api import parse as ls_parse


async def main() -> None:
  doc = await ls_parse(
    path="samples/report.pdf",
    outputs=["markdown", "text", "layout_json"],
    llm="openai/gpt-4o",
    dpi=180,
    parallel_pages=6,
    budget_usd=0.50,
    save_intermediate=True,
  )
  print(doc.metadata)
  print(doc.markdown[:1000])


if __name__ == "__main__":
  asyncio.run(main())

Outputs & Artifacts

./artifacts/report/
  document.md
  document.txt
  layout.json
  overlays/
    page-0001.png
    page-0002.png
  intermediate/
    page-0001.json

Configuration

See docs/CONFIGURATION.md for provider-specific env vars, defaults, and precedence. MLflow tracing is opt-in via --trace-mlflow.

LiteLLM provider setup

LiteLLM reads provider keys from environment variables. Set only those you need:

# OpenAI
OPENAI_API_KEY=sk-...

# Azure OpenAI
AZURE_OPENAI_API_KEY=...  
AZURE_OPENAI_ENDPOINT=https://<your-resource>.openai.azure.com/  
AZURE_OPENAI_API_VERSION=2024-02-15-preview

# Anthropic
ANTHROPIC_API_KEY=...

# Google (Gemini)
GOOGLE_API_KEY=...

Use --llm to pick a model via LiteLLM:

--llm openai/gpt-4o
--llm azure/<deployment_name>
--llm anthropic/claude-3.5-sonnet
--llm google/gemini-1.5-pro

Notes:

  • For Azure, ensure the deployment name references a vision-capable model and that your endpoint/API version are set.
  • Keep temperature low (0–0.2) for consistent JSON.
  • Respect provider rate limits; we use retries with exponential backoff.

Limitations (0.1)

  • No OCR engines; relies entirely on a multimodal LLM
  • Basic tables only (CSV-like); no complex rowspan/colspan recovery
  • No handwriting support; language translation out of scope
  • Confidence scores (if present) are heuristic and not calibrated

Community & Support

  • Open issues and discussions on GitHub
  • For security concerns, follow SECURITY.md (use private advisories)

License

Apache-2.0 (see LICENSE).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

layoutscribe-0.1.0a3.tar.gz (29.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

layoutscribe-0.1.0a3-py3-none-any.whl (35.1 kB view details)

Uploaded Python 3

File details

Details for the file layoutscribe-0.1.0a3.tar.gz.

File metadata

  • Download URL: layoutscribe-0.1.0a3.tar.gz
  • Upload date:
  • Size: 29.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for layoutscribe-0.1.0a3.tar.gz
Algorithm Hash digest
SHA256 4296d797e6a4b7518f6657219980ee2dc7779b659990e8b8c6836735a83ce6e9
MD5 8317c004e57f5c0fc20dd062a074fa07
BLAKE2b-256 f558a1cf32509f0fb0d1717fa3d31d2a128eb2d253f734c244f7a235f61a9fca

See more details on using hashes here.

File details

Details for the file layoutscribe-0.1.0a3-py3-none-any.whl.

File metadata

  • Download URL: layoutscribe-0.1.0a3-py3-none-any.whl
  • Upload date:
  • Size: 35.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for layoutscribe-0.1.0a3-py3-none-any.whl
Algorithm Hash digest
SHA256 495b898027410c6a1e708a04ba418b7249f4d56d1add4d45de4c5fa84101f725
MD5 2d9b0cea9e2f4fbaf46d5fcda6b355de
BLAKE2b-256 afb0c48d3766cff7ab7e9a3b1a4005c41a04c74c789d431a72a4e4623a1d0e16

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page