Agent-based extraction framework for transforming parsed documents into structured planogram data

These details have not been verified by PyPI

Project links

Project description

document-agents

A modular agent-based extraction framework for transforming parsed document content into structured planogram data. This library provides a production-ready architecture for building multi-step extraction pipelines with LLM-powered agents.

Overview

document-agents implements a modular agent framework where each agent performs a single extraction or transformation task. Agents communicate through a shared mutable context and are orchestrated through configurable processing chains. The library is designed for enterprise-scale document processing workloads.

Architecture

The library follows SOLID principles with:

Single Responsibility Principle - Each agent performs exactly one extraction or transformation task
Adapter Pattern - Normalizes provider responses to shared models
Factory Pattern - Creates agent instances by name
Dependency Injection - LLM clients and configurations injected into agents
Async-first - Non-blocking operations for high throughput
Protocol-based Interfaces - Type-safe contracts for agents and LLM clients

Installation

pip install document-agents

Optional Dependencies

# For Anthropic provider
pip install document-agents[anthropic]

# For Google provider
pip install document-agents[google]

# For development
pip install document-agents[dev]

Quick Start

import asyncio
from document_agents import (
    AgentChain,
    AgentConfig,
    AgentContext,
    LLMClient,
    LLMConfig,
    ProcessingMode,
    ShelfReconstructionAgent,
    ProductExtractionAgent,
    SchemaMappingAgent,
)

async def main():
    # Configure LLM client
    llm_config = LLMConfig(
        provider="openai",
        model="gpt-4",
        api_key="your-api-key",
    )
    llm = LLMClient(llm_config)
    
    # Configure agents
    agent_config = AgentConfig(
        retries=3,
        timeout_seconds=120.0,
    )
    
    # Create agents
    agents = {
        "shelf_reconstruction": ShelfReconstructionAgent(llm, agent_config),
        "product_extraction": ProductExtractionAgent(llm, agent_config),
        "schema_mapping": SchemaMappingAgent(llm, agent_config),
    }
    
    # Create chain
    chain = AgentChain(agents)
    
    # Create context with parsed document data
    context = AgentContext(document_id="doc-001")
    context.parse_results = [...]  # Your parsed document results
    
    # Execute chain
    await chain.execute(context, ProcessingMode.STANDARD)
    
    # Get final result
    planogram = context.final_result
    print(f"Extracted {planogram.total_products} products across {planogram.total_shelves} shelves")

asyncio.run(main())

Configuration

LLM Configuration

from document_agents import LLMConfig

llm_config = LLMConfig(
    provider="openai",  # openai, anthropic, google, llamaapi
    model="gpt-4",
    api_key="your-api-key",
    temperature=0.0,
    max_tokens=4096,
    timeout_seconds=120.0,
    max_retries=3,
    base_url=None,  # Optional custom base URL
)

Agent Configuration

from document_agents import AgentConfig

agent_config = AgentConfig(
    retries=3,
    timeout_seconds=120.0,
    prompt_version="v1",
    enable_logging=True,
    enable_telemetry=True,
    log_prompts=False,
    log_responses=False,
)

Processing Modes

The library supports three processing modes for different complexity levels:

SIMPLE Mode

For simple documents with minimal structure:

Product Extraction
Schema Mapping

STANDARD Mode

For typical planogram documents:

Shelf Reconstruction
Product Extraction
Schema Mapping

COMPLEX Mode

For complex multi-page documents with conflicts:

Shelf Reconstruction
Product Extraction
Cross Reference
Conflict Resolution
Schema Mapping

Agents

Shelf Reconstruction Agent

Identifies shelves, determines numbering, detects bays/sections, and detects shelf continuations across pages.

from document_agents import ShelfReconstructionAgent

agent = ShelfReconstructionAgent(llm, agent_config)

Product Extraction Agent

Extracts product names, UPCs, facings, shelf assignments, and position information from tables, markdown, and inline references.

from document_agents import ProductExtractionAgent

agent = ProductExtractionAgent(llm, agent_config)

Cross Reference Agent

Identifies duplicate products, UPC lookups, table linkage, and page reconciliation.

from document_agents import CrossReferenceAgent

agent = CrossReferenceAgent(llm, agent_config)

Conflict Resolution Agent

Detects conflicting values, resolves conflicts, and preserves audit trails.

from document_agents import ConflictResolutionAgent

agent = ConflictResolutionAgent(llm, agent_config)

Schema Mapping Agent

Maps extracted data to structured planogram schema with proper hierarchy.

from document_agents import SchemaMappingAgent, ProcessingMode

agent = SchemaMappingAgent(llm, agent_config, ProcessingMode.STANDARD)

Agent Chain

The AgentChain orchestrates agent execution in configurable sequences:

from document_agents import AgentChain, ProcessingMode

chain = AgentChain(agents)

# Execute with specific processing mode
await chain.execute(context, ProcessingMode.STANDARD)

# Get chain definition
simple_chain = chain.get_chain(ProcessingMode.SIMPLE)

Custom Agents

Create custom agents by extending BaseAgent:

from document_agents import BaseAgent, AgentContext, AgentConfig, LLMClient

class CustomAgent(BaseAgent):
    def __init__(self, llm: LLMClient, config: AgentConfig):
        super().__init__(
            llm=llm,
            prompt_template="custom.txt",
            config=config,
            agent_name="custom_agent",
        )
    
    def _output_schema(self):
        return {
            "type": "object",
            "properties": {
                "result": {"type": "string"}
            }
        }
    
    def _update_context(self, context: AgentContext, parsed):
        context.metadata["custom_result"] = parsed

Error Handling

The library provides a comprehensive exception hierarchy:

from document_agents import (
    DocumentAgentsError,
    AgentExecutionError,
    LLMError,
    PromptRenderingError,
    SchemaValidationError,
)

try:
    await chain.execute(context)
except AgentExecutionError as e:
    print(f"Agent {e.agent_name} failed: {e.message}")
    print(f"Execution time: {e.metadata['execution_time_ms']}ms")
except LLMError as e:
    print(f"LLM error: {e.message}")
    print(f"Provider: {e.metadata['provider']}")

Telemetry

Agent execution is logged with detailed telemetry:

# Get total execution time
total_time = context.get_total_execution_time_ms()

# Get total token usage
input_tokens, output_tokens = context.get_total_tokens()

# Access individual agent logs
for log in context.agent_logs:
    print(f"{log.agent_name}: {log.execution_time_ms}ms")
    print(f"  Input tokens: {log.input_tokens}")
    print(f"  Output tokens: {log.output_tokens}")

LLM Providers

Supported Providers

OpenAI - GPT-4, GPT-3.5 Turbo
Anthropic - Claude 3 Opus, Claude 3 Sonnet
Google - Gemini Pro
LlamaAPI - OpenAI-compatible Llama models

Provider Configuration

# OpenAI
llm_config = LLMConfig(
    provider="openai",
    model="gpt-4",
    api_key="sk-...",
)

# Anthropic
llm_config = LLMConfig(
    provider="anthropic",
    model="claude-3-opus-20240229",
    api_key="sk-ant-...",
)

# Google
llm_config = LLMConfig(
    provider="google",
    model="gemini-pro",
    api_key="AI...",
)

# LlamaAPI
llm_config = LLMConfig(
    provider="llamaapi",
    model="llama-2-70b-chat",
    api_key="your-api-key",
)

Development

Running Tests

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=document_agents

Code Style

# Format code
black document_agents

# Lint code
ruff check document_agents

# Type check
mypy document_agents

Design Principles

Single Responsibility - Each agent performs one task
Dependency Injection - All dependencies injected via constructors
Protocol-based - Type-safe interfaces using Protocol
Async-first - All operations are async for performance
Type-safe - Full type hints with Pydantic validation
Extensible - Easy to add new agents and providers
Testable - Mock-friendly design with clear interfaces

Dependencies

document-core - Shared interfaces and models
openai>=1.30 - OpenAI API client
tiktoken>=0.7 - Token counting
jinja2>=3.1 - Prompt templating
pydantic>=2.0 - Data validation

Optional

anthropic>=0.18 - Anthropic API client
google-generativeai>=0.3 - Google API client

License

MIT

Support

For issues, questions, or contributions, please visit the project repository.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

Jun 18, 2026

0.1.0

Jun 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pepsico_document_agents-0.1.1.tar.gz (23.1 kB view details)

Uploaded Jun 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pepsico_document_agents-0.1.1-py3-none-any.whl (23.9 kB view details)

Uploaded Jun 18, 2026 Python 3

File details

Details for the file pepsico_document_agents-0.1.1.tar.gz.

File metadata

Download URL: pepsico_document_agents-0.1.1.tar.gz
Upload date: Jun 18, 2026
Size: 23.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.6

File hashes

Hashes for pepsico_document_agents-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`e5bde9473f562d6b9f6196689babb834670810106342e90ecbcfc970d13f3a8a`
MD5	`00b251c167ce5b81cab9d57d588868a7`
BLAKE2b-256	`da919d7a936664b1133cc365d886b2d04d454d50caf489820268c6c30cc84d00`

See more details on using hashes here.

File details

Details for the file pepsico_document_agents-0.1.1-py3-none-any.whl.

File metadata

Download URL: pepsico_document_agents-0.1.1-py3-none-any.whl
Upload date: Jun 18, 2026
Size: 23.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.6

File hashes

Hashes for pepsico_document_agents-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e35f164498633438f06c8cf4f572ee4a5b12df2d917d8cc0116796cea1a95c35`
MD5	`64fc3850b5cc18e144ec972e598511d0`
BLAKE2b-256	`98f6e62aed348d948aaabca121d4c0a559df8f4a516e019e246c673e70b08950`

See more details on using hashes here.

pepsico-document-agents 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

document-agents

Overview

Architecture

Installation

Optional Dependencies

Quick Start

Configuration

LLM Configuration

Agent Configuration

Processing Modes

SIMPLE Mode

STANDARD Mode

COMPLEX Mode

Agents

Shelf Reconstruction Agent

Product Extraction Agent

Cross Reference Agent

Conflict Resolution Agent

Schema Mapping Agent

Agent Chain

Custom Agents

Error Handling

Telemetry

LLM Providers

Supported Providers

Provider Configuration

Development

Running Tests

Code Style

Design Principles

Dependencies

Optional

License

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes