Skip to main content

Agentic and single-pass Key Information Extraction (KIE) from documents using LLMs

Project description

Agentic KIE: LLM-Based Key Information Extraction from Documents

CI CD codecov PyPI License: MIT

A Python package for extracting structured information from PDF documents using large language models.

agentic-kie handles the full extraction pipeline: it loads PDFs (including scanned documents via a pluggable OCR backend), and exposes both the raw text and rendered page images so that LLMs can reason over document content using text, vision, or a combination of both. Two extraction strategies are available — a fast single-pass approach and a more capable agentic loop — designed for use in production pipelines and research workflows alike.

Contents


Installation

Requires Python 3.13 or later.

pip install agentic-kie

Or with uv:

uv add agentic-kie

Quick start

Loading a PDF

PDFLoader is the main entry point. It handles file I/O, detects whether the document has a native text layer, and returns an immutable PDFDocument ready for downstream use.

from pathlib import Path
from agentic_kie import PDFLoader

loader = PDFLoader()
doc = loader.load(Path("invoice.pdf"))

# Access the full document text
print(doc.full_text)

# Navigate by page (zero-indexed, half-open ranges)
print(doc.read_text(0, 3))   # pages 0, 1, 2
print(doc.read_text(4))      # page 4 only

# Render pages to base64-encoded PNG strings (for vision models)
images = doc.all_images          # all pages
first_page = doc.load_images(0)  # single page

PDFDocument exposes:

Attribute / Method Description
page_count Total number of pages
is_ocr True if text was extracted via OCR
full_text All pages concatenated with double newlines
read_text(start, end=None) Text slice over a page range
all_images All pages as base64 PNGs (cached)
load_images(start, end=None) Image slice over a page range

Scanned documents and OCR

For scanned PDFs, PDFLoader automatically detects the absence of a text layer and routes to an OCR provider. Any object implementing extract_text(image: bytes) -> str qualifies — no subclassing required.

from agentic_kie import PDFLoader, OCRProvider

class TextractProvider:
    def extract_text(self, image: bytes) -> str:
        # call AWS Textract (or any OCR service)
        ...

loader = PDFLoader(ocr_provider=TextractProvider())
doc = loader.load(Path("scanned_form.pdf"))

print(doc.is_ocr)    # True
print(doc.full_text)

The dpi and text_threshold parameters let you control rendering resolution and the sensitivity of the native-text detection heuristic:

loader = PDFLoader(
    ocr_provider=TextractProvider(),
    dpi=300,            # higher DPI improves OCR accuracy on dense documents
    text_threshold=50,  # minimum avg characters/page to skip OCR
)

Error handling

All document-level failures raise from a common DocumentLoadError base, making them easy to catch together or individually:

from agentic_kie import (
    DocumentLoadError,
    CorruptDocumentError,
    PasswordProtectedError,
    EmptyDocumentError,
    OCRNotConfiguredError,
)

try:
    doc = loader.load(path)
except PasswordProtectedError:
    print("Document is encrypted")
except OCRNotConfiguredError:
    print("Scanned document detected — provide an OCR provider")
except DocumentLoadError as e:
    print(f"Load failed: {e}")

Extraction strategies

All extractors satisfy the Extractor protocol — a single extract(document) -> T method that takes a PDFDocument and returns a validated instance of your Pydantic schema. This lets you swap strategies without changing calling code.

Single-pass extraction

SinglePassExtractor issues one structured LLM call and parses the response directly against a Pydantic schema. Fast, predictable, and suitable for well-structured documents.

from pydantic import BaseModel
from langchain_openai import ChatOpenAI
from agentic_kie import PDFLoader, SinglePassExtractor

class Invoice(BaseModel):
    vendor: str
    total: float
    currency: str
    due_date: str | None

loader = PDFLoader()
doc = loader.load(Path("invoice.pdf"))

extractor = SinglePassExtractor(
    model=ChatOpenAI(model="gpt-4o"),
    schema=Invoice,
)

result = extractor.extract(doc)
print(result.vendor, result.total)

Constructor parameters

Parameter Type Default Description
model BaseChatModel required Any LangChain chat model (ChatOpenAI, ChatAnthropic, ChatBedrock, etc.)
schema type[T] required Pydantic model class defining the fields to extract
modality "text" | "image" | "multimodal" "text" Which document representations to send to the model
system_prompt str | None None Custom system prompt (uses a sensible default when omitted)
max_retries int 3 Maximum retry attempts on transient failures (rate limits, timeouts). Uses exponential backoff with jitter

Modalities

  • "text" — sends only the extracted text. Fastest and cheapest; works well when the document has a reliable text layer.
  • "image" — sends rendered page images. Useful for visually rich documents where layout matters.
  • "multimodal" — sends text followed by page images, giving the model both signals.

Agentic extraction

A LangChain-powered agent loop that can reason iteratively, call tools, and refine its output over multiple steps. Better suited for complex or ambiguous documents. Coming soon.


Contributing

See CONTRIBUTING.md for development setup, available make targets, and the CI/CD pipeline.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentic_kie-0.3.1.tar.gz (3.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentic_kie-0.3.1-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file agentic_kie-0.3.1.tar.gz.

File metadata

  • Download URL: agentic_kie-0.3.1.tar.gz
  • Upload date:
  • Size: 3.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agentic_kie-0.3.1.tar.gz
Algorithm Hash digest
SHA256 7aa573ec158d8c8151bad6dbefffbff2496d919c690808b0c4b99037b607fcca
MD5 64ecd210f4bc6f5a4a72f23bb17c0ce5
BLAKE2b-256 d2bcfa9b007c83615646ec25396c3bb0b661100c5358bb25e881ce3b6790f050

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_kie-0.3.1.tar.gz:

Publisher: cd.yml on gafnts/agentic-kie

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agentic_kie-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: agentic_kie-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agentic_kie-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b16fd25ba63999d4bd02c6c049aa1704ed0e8e340743c5a9f7955693e7a6b135
MD5 cd6c45c690c05828a5518f84d0d671cb
BLAKE2b-256 21532fa5756306d708ad4c080decf0a3225b9d8c8549da5cead5ae395fc08a85

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_kie-0.3.1-py3-none-any.whl:

Publisher: cd.yml on gafnts/agentic-kie

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page