Agentic and single-pass Key Information Extraction (KIE) from documents using LLMs

Project description

Agentic KIE: LLM-Based Key Information Extraction from Documents

A Python package for extracting structured information from PDF documents using large language models.

agentic-kie handles the full extraction pipeline: it loads PDFs (including scanned documents via a pluggable OCR backend), and exposes both the raw text and rendered page images so that LLMs can reason over document content using text, vision, or a combination of both. Two extraction strategies are available — a fast single-pass approach and a more capable agentic loop — designed for use in production pipelines and research workflows alike.

Installation
Quick start
Extraction strategies
- Single-pass extraction
- Agentic extraction
Contributing

Installation

Requires Python 3.13 or later.

pip install agentic-kie

Or with uv:

uv add agentic-kie

Quick start

Loading a PDF

PDFLoader is the main entry point. It handles file I/O, detects whether the document has a native text layer, and returns an immutable PDFDocument ready for downstream use.

from pathlib import Path
from agentic_kie import PDFLoader

loader = PDFLoader()
doc = loader.load(Path("invoice.pdf"))

# Access the full document text
print(doc.full_text)

# Navigate by page (zero-indexed, half-open ranges)
print(doc.read_text(0, 3))   # pages 0, 1, 2
print(doc.read_text(4))      # page 4 only

# Render pages to base64-encoded PNG strings (for vision models)
images = doc.all_images          # all pages
first_page = doc.load_images(0)  # single page

PDFDocument exposes:

Attribute / Method	Description
`page_count`	Total number of pages
`is_ocr`	`True` if text was extracted via OCR
`full_text`	All pages concatenated with double newlines
`read_text(start, end=None)`	Text slice over a page range
`all_images`	All pages as base64 PNGs (cached)
`load_images(start, end=None)`	Image slice over a page range

Scanned documents and OCR

For scanned PDFs, PDFLoader automatically detects the absence of a text layer and routes to an OCR provider. Any object implementing extract_text(image: bytes) -> str qualifies — no subclassing required.

from agentic_kie import PDFLoader, OCRProvider

class TextractProvider:
    def extract_text(self, image: bytes) -> str:
        # call AWS Textract (or any OCR service)
        ...

loader = PDFLoader(ocr_provider=TextractProvider())
doc = loader.load(Path("scanned_form.pdf"))

print(doc.is_ocr)    # True
print(doc.full_text)

The dpi and text_threshold parameters let you control rendering resolution and the sensitivity of the native-text detection heuristic:

loader = PDFLoader(
    ocr_provider=TextractProvider(),
    dpi=300,            # higher DPI improves OCR accuracy on dense documents
    text_threshold=50,  # minimum avg characters/page to skip OCR
)

Error handling

All document-level failures raise from a common DocumentLoadError base, making them easy to catch together or individually:

from agentic_kie import (
    DocumentLoadError,
    CorruptDocumentError,
    PasswordProtectedError,
    EmptyDocumentError,
    OCRNotConfiguredError,
)

try:
    doc = loader.load(path)
except PasswordProtectedError:
    print("Document is encrypted")
except OCRNotConfiguredError:
    print("Scanned document detected — provide an OCR provider")
except DocumentLoadError as e:
    print(f"Load failed: {e}")

Extraction strategies

All extractors satisfy the Extractor protocol — a single extract(document) -> T method that takes a PDFDocument and returns a validated instance of your Pydantic schema. This lets you swap strategies without changing calling code.

Single-pass extraction

SinglePassExtractor issues one structured LLM call and parses the response directly against a Pydantic schema. Fast, predictable, and suitable for well-structured documents.

from pydantic import BaseModel
from langchain_openai import ChatOpenAI
from agentic_kie import PDFLoader, SinglePassExtractor

class Invoice(BaseModel):
    vendor: str
    total: float
    currency: str
    due_date: str | None

loader = PDFLoader()
doc = loader.load(Path("invoice.pdf"))

extractor = SinglePassExtractor(
    model=ChatOpenAI(model="gpt-4o"),
    schema=Invoice,
)

result = extractor.extract(doc)
print(result.vendor, result.total)

Constructor parameters

Parameter	Type	Default	Description
`model`	`BaseChatModel`	required	Any LangChain chat model (ChatOpenAI, ChatAnthropic, ChatBedrock, etc.)
`schema`	`type[T]`	required	Pydantic model class defining the fields to extract
`modality`	`"text" \| "image" \| "multimodal"`	`"text"`	Which document representations to send to the model
`system_prompt`	`str \| None`	`None`	Custom system prompt (uses a sensible default when omitted)
`max_retries`	`int`	`3`	Maximum retry attempts on transient failures (rate limits, timeouts). Uses exponential backoff with jitter

Modalities

"text" — sends only the extracted text. Fastest and cheapest; works well when the document has a reliable text layer.
"image" — sends rendered page images. Useful for visually rich documents where layout matters.
"multimodal" — sends text followed by page images, giving the model both signals.

Agentic extraction

A LangChain-powered agent loop that can reason iteratively, call tools, and refine its output over multiple steps. Better suited for complex or ambiguous documents. Coming soon.

Contributing

See CONTRIBUTING.md for development setup, available make targets, and the CI/CD pipeline.

Project details

Release history Release notifications | RSS feed

0.5.1

Apr 12, 2026

0.5.0

Apr 11, 2026

0.4.2

Apr 11, 2026

0.4.1

Apr 2, 2026

0.4.0

Apr 2, 2026

This version

0.3.1

Mar 29, 2026

0.3.0

Mar 29, 2026

0.2.0

Mar 28, 2026

0.1.0

Mar 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentic_kie-0.3.1.tar.gz (3.8 MB view details)

Uploaded Mar 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentic_kie-0.3.1-py3-none-any.whl (13.3 kB view details)

Uploaded Mar 29, 2026 Python 3

File details

Details for the file agentic_kie-0.3.1.tar.gz.

File metadata

Download URL: agentic_kie-0.3.1.tar.gz
Upload date: Mar 29, 2026
Size: 3.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agentic_kie-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`7aa573ec158d8c8151bad6dbefffbff2496d919c690808b0c4b99037b607fcca`
MD5	`64ecd210f4bc6f5a4a72f23bb17c0ce5`
BLAKE2b-256	`d2bcfa9b007c83615646ec25396c3bb0b661100c5358bb25e881ce3b6790f050`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_kie-0.3.1.tar.gz:

Publisher: cd.yml on gafnts/agentic-kie

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agentic_kie-0.3.1.tar.gz
- Subject digest: 7aa573ec158d8c8151bad6dbefffbff2496d919c690808b0c4b99037b607fcca
- Sigstore transparency entry: 1195199787
- Sigstore integration time: Mar 29, 2026
Source repository:
- Permalink: gafnts/agentic-kie@2fc55517e0d15feff61f7e12bdab192ae2e27ddb
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/gafnts
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: cd.yml@2fc55517e0d15feff61f7e12bdab192ae2e27ddb
- Trigger Event: push

File details

Details for the file agentic_kie-0.3.1-py3-none-any.whl.

File metadata

Download URL: agentic_kie-0.3.1-py3-none-any.whl
Upload date: Mar 29, 2026
Size: 13.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agentic_kie-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b16fd25ba63999d4bd02c6c049aa1704ed0e8e340743c5a9f7955693e7a6b135`
MD5	`cd6c45c690c05828a5518f84d0d671cb`
BLAKE2b-256	`21532fa5756306d708ad4c080decf0a3225b9d8c8549da5cead5ae395fc08a85`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_kie-0.3.1-py3-none-any.whl:

Publisher: cd.yml on gafnts/agentic-kie

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agentic_kie-0.3.1-py3-none-any.whl
- Subject digest: b16fd25ba63999d4bd02c6c049aa1704ed0e8e340743c5a9f7955693e7a6b135
- Sigstore transparency entry: 1195199793
- Sigstore integration time: Mar 29, 2026
Source repository:
- Permalink: gafnts/agentic-kie@2fc55517e0d15feff61f7e12bdab192ae2e27ddb
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/gafnts
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: cd.yml@2fc55517e0d15feff61f7e12bdab192ae2e27ddb
- Trigger Event: push

agentic-kie 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Agentic KIE: LLM-Based Key Information Extraction from Documents

Contents

Installation

Quick start

Loading a PDF

Scanned documents and OCR

Error handling

Extraction strategies

Single-pass extraction

Constructor parameters

Modalities

Agentic extraction

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance