Agentic and single-pass Key Information Extraction (KIE) from documents using LLMs

These details have not been verified by PyPI

Project description

Agentic KIE

Structured key information extraction from PDF documents, powered by LLMs.

A document enters the system as a file path. It leaves as a validated Pydantic instance. Everything in between (text-layer detection, OCR routing, image rendering, LLM orchestration, output parsing, retry logic) is the library's responsibility.

The problem
The idea
Installation
Core abstractions
Extraction strategies
- Single-pass extraction
- Agentic extraction
Modalities
Error handling
Examples
Contributing

The problem

Extracting structured data from PDFs is deceptively hard. The file format is a rendering instruction set, not a data container. Text layers may be missing, malformed, or absent entirely in scanned documents. Layout carries semantic meaning that raw text extraction destroys. And once you have the content, you still need an orchestration layer that let a LLM reason over it, produce typed output, and handle the inevitable failures.

The idea

Two extraction strategies are available:

Single-pass: One structured LLM call over the full document text. Fastest and cheapest option. Matches or outperforms agentic in most configurations (especially with smaller models).
Agentic: A ReAct agent loop with multimodal document tools. More resilient to document length, but only justifies its cost with standard-tier models on long or complex documents.

Both strategies satisfy the same protocol and return the same type. Swap one for the other without changing downstream code.

from pathlib import Path
from pydantic import BaseModel
from langchain_google_genai import ChatGoogleGenerativeAI

from agentic_kie import PDFLoader, SinglePassExtractor, AgenticExtractor

class Invoice(BaseModel):
    vendor: str
    total: float
    currency: str
    due_date: str | None

model = ChatGoogleGenerativeAI(model="gemini-3.1-flash-lite-preview")
document = PDFLoader().load(Path("invoice.pdf"))

# Single LLM call
single = SinglePassExtractor(model=model, schema=Invoice)
result = single.extract(document)

# Or let an agent reason over the document
agent = AgenticExtractor(model=model, schema=Invoice)
result = agent.extract(document)

agentic-kie packages that entire workflow into a typed, tested library with a clear separation of concerns: document ingestion, content representation, and structured extraction.

Installation

[!IMPORTANT] Requires Python 3.13 or later.

uv add agentic-kie

Install with a model provider:

uv add "agentic-kie[anthropic]"   # Claude
uv add "agentic-kie[google]"      # Gemini
uv add "agentic-kie[openai]"      # GPT
uv add "agentic-kie[bedrock]"     # AWS Bedrock
uv add "agentic-kie[all]"         # All of the above

[!TIP] Any LangChain chat model works. The extras above are provided for convenience.

Core abstractions

The library is organized around four concepts: a loader that absorbs PDF complexity, an immutable document that exposes content, a protocol for pluggable OCR, and extractors that produce structured output.

PDFLoader

The ingestion boundary. Takes raw PDF input (a file path or in-memory bytes), detects whether the document has a native text layer (using a characters-per-page heuristic), routes to OCR when needed, and returns a validated PDFDocument.

from pathlib import Path
from agentic_kie import PDFLoader

loader = PDFLoader()
document = loader.load(Path("contract.pdf"))

When the PDF comes from a stream (S3, HTTP, a queue) and you want to skip the filesystem, use load_bytes. The name argument shows up in log lines and error messages — pass something meaningful like the S3 key:

data = s3_client.get_object(Bucket=bucket, Key=key)["Body"].read()
document = loader.load_bytes(data, name=key)

For scanned documents, pass an OCR provider:

loader = PDFLoader(ocr_provider=MyOCRBackend())
document = loader.load(Path("scanned_contract.pdf"))

PDFDocument

An immutable representation of the loaded document. Exposes text and rendered page images (the two modalities that LLMs can reason over). Images are rendered lazily and cached on first access.

Attribute / Method	Description
`page_count`	Total number of pages
`is_ocr`	`True` if text was extracted via OCR
`full_text`	All pages joined with double newlines
`read_text(start, end=None)`	Text slice over a page range (zero-indexed, half-open)
`all_images`	All pages as base64-encoded PNGs (cached)
`load_images(start, end=None)`	Image slice over a page range

OCRProvider

A structural protocol. Any object with an extract_text(image: bytes) -> str method qualifies.

from agentic_kie import OCRProvider

class TextractProvider:
    """Wraps AWS Textract as an OCR backend."""

    def extract_text(self, image: bytes) -> str:
        # call Textract, return plain text
        ...

# TextractProvider satisfies OCRProvider by structure alone
loader = PDFLoader(ocr_provider=TextractProvider())

Extractors

Both extraction strategies satisfy the Extractor protocol: a single extract(document) -> ExtractionResult[T] method that takes a PDFDocument and returns an ExtractionResult. This enables type-safe dispatch without coupling strategies through inheritance. Swap a SinglePassExtractor for an AgenticExtractor (or your own) without touching downstream code.

ExtractionResult

Every extract call returns an ExtractionResult[T], a frozen dataclass pairing the validated schema instance with the aggregated token usage for the call. Splitting these out lets callers (Lambdas, batch jobs, eval harnesses) log cost and throughput without re-instrumenting the LLM.

from agentic_kie import ExtractionResult

result: ExtractionResult[Invoice] = extractor.extract(document)
result.value     # validated Invoice instance
result.usage     # aggregated token usage

Attribute	Description
`value`	Validated instance of the target Pydantic schema
`usage`	Aggregated `UsageMetadata` across every LLM call made during the extraction

The usage field mirrors LangChain's UsageMetadata shape: input_tokens, output_tokens, total_tokens, plus optional input_token_details / output_token_details for cache and reasoning breakdowns. For the agentic strategy it sums across every step the agent took, so a single number reflects the full extraction cost.

Extraction strategies

Single-pass extraction

SinglePassExtractor sends the full document content to the model in one call, with structured output bound to the target schema. The chain is built once at construction time and reused across documents.

from pydantic import BaseModel
from langchain_openai import ChatOpenAI
from agentic_kie import PDFLoader, SinglePassExtractor

class Invoice(BaseModel):
    vendor: str
    total: float
    currency: str
    due_date: str | None

document = PDFLoader().load(Path("invoice.pdf"))

extractor = SinglePassExtractor(
    model=ChatOpenAI(model="gpt-5.4-mini"),
    schema=Invoice,
    modality="multimodal",
    max_retries=3,
)

result = extractor.extract(document)

Parameter	Type	Default	Description
`model`	`BaseChatModel`	required	Any LangChain chat model
`schema`	`type[T]`	required	Pydantic model defining the extraction target
`modality`	`"text" \| "image" \| "multimodal"`	`"text"`	Document representation sent to the model
`system_prompt`	`str \| None`	`None`	Custom system prompt (uses a sensible default when omitted)
`max_retries`	`int`	`3`	Retry attempts with exponential backoff and jitter

Agentic extraction

AgenticExtractor builds a ReAct agent equipped with document tools (get_page_count, read_text, and load_images) scoped to the document being extracted. The agent decides which pages to inspect, in what order, and stops when it has enough information to produce the target schema.

from pydantic import BaseModel
from langchain_anthropic import ChatAnthropic
from agentic_kie import PDFLoader, AgenticExtractor

class Contract(BaseModel):
    parties: list[str]
    effective_date: str
    governing_law: str | None
    termination_clause: str | None

document = PDFLoader().load(Path("contract.pdf"))

extractor = AgenticExtractor(
    model=ChatAnthropic(model="claude-haiku-4-5"),
    schema=Contract,
    modality="text",
    max_iterations=50,
)

result = extractor.extract(document)

Parameter	Type	Default	Description
`model`	`BaseChatModel`	required	Any LangChain chat model
`schema`	`type[T]`	required	Pydantic model defining the extraction target
`modality`	`"text" \| "image" \| "multimodal"`	`"text"`	Controls which document tools the agent can use
`system_prompt`	`str`	(built-in)	Custom system prompt for the agent
`max_iterations`	`int`	`50`	Maximum agent steps before raising `ExtractionError`
`max_retries`	`int`	`3`	Retry attempts on transient model failures

Modalities

Both extractors accept a modality parameter that controls how document content is presented to the model:

Modality	What the model sees	When to use
`"text"`	Extracted text only	Reliable text layer, cost-sensitive, fast
`"image"`	Rendered page images (base64 PNG)	Visually rich documents, layout matters
`"multimodal"`	Text followed by images	Maximum signal, when accuracy justifies cost

[!NOTE] For the agentic extractor, modality controls which tools are exposed: "text" provides read_text, "image" provides load_images, and "multimodal" provides both. get_page_count is always available.

Error handling

All document-level failures derive from DocumentLoadError, making them easy to catch together or individually. Extraction failures raise ExtractionError.

from agentic_kie import (
    DocumentLoadError,
    CorruptDocumentError,
    PasswordProtectedError,
    EmptyDocumentError,
    OCRNotConfiguredError,
    ExtractionError,
)

try:
    doc = loader.load(path)
    result = extractor.extract(doc)
except PasswordProtectedError:
    ...  # encrypted PDF
except OCRNotConfiguredError:
    ...  # scanned document, no OCR provider
except EmptyDocumentError:
    ...  # zero pages or no extractable text
except CorruptDocumentError:
    ...  # unparseable file
except DocumentLoadError:
    ...  # catch-all for loading failures
except ExtractionError:
    ...  # agent exceeded iteration limit

Examples

The examples/ directory contains runnable scripts demonstrating both extraction strategies across different providers, using the Kleister NDA preparation package.

Before running any example, fetch the dataset:

uv run nda ./examples/data

This processes the Kleister NDA dataset into examples/data/, which the scripts expect. Then run a script from the project root:

uv run examples/agent/text-only.py

Contributing

See CONTRIBUTING.md for development setup, available make targets, and the CI/CD pipeline.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.6.0

May 17, 2026

0.5.1

Apr 12, 2026

0.5.0

Apr 11, 2026

0.4.2

Apr 11, 2026

0.4.1

Apr 2, 2026

0.4.0

Apr 2, 2026

0.3.1

Mar 29, 2026

0.3.0

Mar 29, 2026

0.2.0

Mar 28, 2026

0.1.0

Mar 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentic_kie-0.6.0.tar.gz (850.5 kB view details)

Uploaded May 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentic_kie-0.6.0-py3-none-any.whl (20.3 kB view details)

Uploaded May 17, 2026 Python 3

File details

Details for the file agentic_kie-0.6.0.tar.gz.

File metadata

Download URL: agentic_kie-0.6.0.tar.gz
Upload date: May 17, 2026
Size: 850.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentic_kie-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`63b7559d49772a38f4c1263067c0a18992b8e903d80e4e06369535f9bc83f196`
MD5	`d7a6d4587b04f915b1c730ab21b1abbe`
BLAKE2b-256	`62b7ea3789b9e8db0f6f360f76f8f2f23c1e397ab90ecfd0416a93a45a9cf347`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_kie-0.6.0.tar.gz:

Publisher: cd.yml on gafnts/agentic-kie

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agentic_kie-0.6.0.tar.gz
- Subject digest: 63b7559d49772a38f4c1263067c0a18992b8e903d80e4e06369535f9bc83f196
- Sigstore transparency entry: 1555602872
- Sigstore integration time: May 17, 2026
Source repository:
- Permalink: gafnts/agentic-kie@96f1df0b234e8abcc8a33e9df06236e14d3ef5b7
- Branch / Tag: refs/tags/v0.6.0
- Owner: https://github.com/gafnts
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: cd.yml@96f1df0b234e8abcc8a33e9df06236e14d3ef5b7
- Trigger Event: push

File details

Details for the file agentic_kie-0.6.0-py3-none-any.whl.

File metadata

Download URL: agentic_kie-0.6.0-py3-none-any.whl
Upload date: May 17, 2026
Size: 20.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentic_kie-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`90af160b459de2658a0d302c13f6d3f396cf9e4b46ef7cb7d96c91f69ed91d26`
MD5	`f994dbb31c8bd83a96402261237e5dd3`
BLAKE2b-256	`f733f1b45af374ab52ec0b434401f6e8e5673f39d32c86f5bbf094c863b30c88`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_kie-0.6.0-py3-none-any.whl:

Publisher: cd.yml on gafnts/agentic-kie

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agentic_kie-0.6.0-py3-none-any.whl
- Subject digest: 90af160b459de2658a0d302c13f6d3f396cf9e4b46ef7cb7d96c91f69ed91d26
- Sigstore transparency entry: 1555603054
- Sigstore integration time: May 17, 2026
Source repository:
- Permalink: gafnts/agentic-kie@96f1df0b234e8abcc8a33e9df06236e14d3ef5b7
- Branch / Tag: refs/tags/v0.6.0
- Owner: https://github.com/gafnts
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: cd.yml@96f1df0b234e8abcc8a33e9df06236e14d3ef5b7
- Trigger Event: push

agentic-kie 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Agentic KIE

Contents

The problem

The idea

Installation

Core abstractions

PDFLoader

PDFDocument

OCRProvider

Extractors

ExtractionResult

Extraction strategies

Single-pass extraction

Agentic extraction

Modalities

Error handling

Examples

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance