Skip to main content

Agent-based extraction framework for transforming parsed documents into structured planogram data

Project description

document-agents

A modular agent-based extraction framework for transforming parsed document content into structured planogram data. This library provides a production-ready architecture for building multi-step extraction pipelines with LLM-powered agents.

Overview

document-agents implements a modular agent framework where each agent performs a single extraction or transformation task. Agents communicate through a shared mutable context and are orchestrated through configurable processing chains. The library is designed for enterprise-scale document processing workloads.

Architecture

The library follows SOLID principles with:

  • Single Responsibility Principle - Each agent performs exactly one extraction or transformation task
  • Adapter Pattern - Normalizes provider responses to shared models
  • Factory Pattern - Creates agent instances by name
  • Dependency Injection - LLM clients and configurations injected into agents
  • Async-first - Non-blocking operations for high throughput
  • Protocol-based Interfaces - Type-safe contracts for agents and LLM clients

Installation

pip install document-agents

Optional Dependencies

# For Anthropic provider
pip install document-agents[anthropic]

# For Google provider
pip install document-agents[google]

# For development
pip install document-agents[dev]

Quick Start

import asyncio
from document_agents import (
    AgentChain,
    AgentConfig,
    AgentContext,
    LLMClient,
    LLMConfig,
    ProcessingMode,
    ShelfReconstructionAgent,
    ProductExtractionAgent,
    SchemaMappingAgent,
)

async def main():
    # Configure LLM client
    llm_config = LLMConfig(
        provider="openai",
        model="gpt-4",
        api_key="your-api-key",
    )
    llm = LLMClient(llm_config)
    
    # Configure agents
    agent_config = AgentConfig(
        retries=3,
        timeout_seconds=120.0,
    )
    
    # Create agents
    agents = {
        "shelf_reconstruction": ShelfReconstructionAgent(llm, agent_config),
        "product_extraction": ProductExtractionAgent(llm, agent_config),
        "schema_mapping": SchemaMappingAgent(llm, agent_config),
    }
    
    # Create chain
    chain = AgentChain(agents)
    
    # Create context with parsed document data
    context = AgentContext(document_id="doc-001")
    context.parse_results = [...]  # Your parsed document results
    
    # Execute chain
    await chain.execute(context, ProcessingMode.STANDARD)
    
    # Get final result
    planogram = context.final_result
    print(f"Extracted {planogram.total_products} products across {planogram.total_shelves} shelves")

asyncio.run(main())

Configuration

LLM Configuration

from document_agents import LLMConfig

llm_config = LLMConfig(
    provider="openai",  # openai, anthropic, google, llamaapi
    model="gpt-4",
    api_key="your-api-key",
    temperature=0.0,
    max_tokens=4096,
    timeout_seconds=120.0,
    max_retries=3,
    base_url=None,  # Optional custom base URL
)

Agent Configuration

from document_agents import AgentConfig

agent_config = AgentConfig(
    retries=3,
    timeout_seconds=120.0,
    prompt_version="v1",
    enable_logging=True,
    enable_telemetry=True,
    log_prompts=False,
    log_responses=False,
)

Processing Modes

The library supports three processing modes for different complexity levels:

SIMPLE Mode

For simple documents with minimal structure:

  • Product Extraction
  • Schema Mapping

STANDARD Mode

For typical planogram documents:

  • Shelf Reconstruction
  • Product Extraction
  • Schema Mapping

COMPLEX Mode

For complex multi-page documents with conflicts:

  • Shelf Reconstruction
  • Product Extraction
  • Cross Reference
  • Conflict Resolution
  • Schema Mapping

Agents

Shelf Reconstruction Agent

Identifies shelves, determines numbering, detects bays/sections, and detects shelf continuations across pages.

from document_agents import ShelfReconstructionAgent

agent = ShelfReconstructionAgent(llm, agent_config)

Product Extraction Agent

Extracts product names, UPCs, facings, shelf assignments, and position information from tables, markdown, and inline references.

from document_agents import ProductExtractionAgent

agent = ProductExtractionAgent(llm, agent_config)

Cross Reference Agent

Identifies duplicate products, UPC lookups, table linkage, and page reconciliation.

from document_agents import CrossReferenceAgent

agent = CrossReferenceAgent(llm, agent_config)

Conflict Resolution Agent

Detects conflicting values, resolves conflicts, and preserves audit trails.

from document_agents import ConflictResolutionAgent

agent = ConflictResolutionAgent(llm, agent_config)

Schema Mapping Agent

Maps extracted data to structured planogram schema with proper hierarchy.

from document_agents import SchemaMappingAgent, ProcessingMode

agent = SchemaMappingAgent(llm, agent_config, ProcessingMode.STANDARD)

Agent Chain

The AgentChain orchestrates agent execution in configurable sequences:

from document_agents import AgentChain, ProcessingMode

chain = AgentChain(agents)

# Execute with specific processing mode
await chain.execute(context, ProcessingMode.STANDARD)

# Get chain definition
simple_chain = chain.get_chain(ProcessingMode.SIMPLE)

Custom Agents

Create custom agents by extending BaseAgent:

from document_agents import BaseAgent, AgentContext, AgentConfig, LLMClient

class CustomAgent(BaseAgent):
    def __init__(self, llm: LLMClient, config: AgentConfig):
        super().__init__(
            llm=llm,
            prompt_template="custom.txt",
            config=config,
            agent_name="custom_agent",
        )
    
    def _output_schema(self):
        return {
            "type": "object",
            "properties": {
                "result": {"type": "string"}
            }
        }
    
    def _update_context(self, context: AgentContext, parsed):
        context.metadata["custom_result"] = parsed

Error Handling

The library provides a comprehensive exception hierarchy:

from document_agents import (
    DocumentAgentsError,
    AgentExecutionError,
    LLMError,
    PromptRenderingError,
    SchemaValidationError,
)

try:
    await chain.execute(context)
except AgentExecutionError as e:
    print(f"Agent {e.agent_name} failed: {e.message}")
    print(f"Execution time: {e.metadata['execution_time_ms']}ms")
except LLMError as e:
    print(f"LLM error: {e.message}")
    print(f"Provider: {e.metadata['provider']}")

Telemetry

Agent execution is logged with detailed telemetry:

# Get total execution time
total_time = context.get_total_execution_time_ms()

# Get total token usage
input_tokens, output_tokens = context.get_total_tokens()

# Access individual agent logs
for log in context.agent_logs:
    print(f"{log.agent_name}: {log.execution_time_ms}ms")
    print(f"  Input tokens: {log.input_tokens}")
    print(f"  Output tokens: {log.output_tokens}")

LLM Providers

Supported Providers

  • OpenAI - GPT-4, GPT-3.5 Turbo
  • Anthropic - Claude 3 Opus, Claude 3 Sonnet
  • Google - Gemini Pro
  • LlamaAPI - OpenAI-compatible Llama models

Provider Configuration

# OpenAI
llm_config = LLMConfig(
    provider="openai",
    model="gpt-4",
    api_key="sk-...",
)

# Anthropic
llm_config = LLMConfig(
    provider="anthropic",
    model="claude-3-opus-20240229",
    api_key="sk-ant-...",
)

# Google
llm_config = LLMConfig(
    provider="google",
    model="gemini-pro",
    api_key="AI...",
)

# LlamaAPI
llm_config = LLMConfig(
    provider="llamaapi",
    model="llama-2-70b-chat",
    api_key="your-api-key",
)

Development

Running Tests

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=document_agents

Code Style

# Format code
black document_agents

# Lint code
ruff check document_agents

# Type check
mypy document_agents

Design Principles

  1. Single Responsibility - Each agent performs one task
  2. Dependency Injection - All dependencies injected via constructors
  3. Protocol-based - Type-safe interfaces using Protocol
  4. Async-first - All operations are async for performance
  5. Type-safe - Full type hints with Pydantic validation
  6. Extensible - Easy to add new agents and providers
  7. Testable - Mock-friendly design with clear interfaces

Dependencies

  • plano-core - Shared interfaces and models
  • openai>=1.30 - OpenAI API client
  • tiktoken>=0.7 - Token counting
  • jinja2>=3.1 - Prompt templating
  • pydantic>=2.0 - Data validation

Optional

  • anthropic>=0.18 - Anthropic API client
  • google-generativeai>=0.3 - Google API client

License

MIT

Support

For issues, questions, or contributions, please visit the project repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pepsico_document_agents-0.1.0.tar.gz (23.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pepsico_document_agents-0.1.0-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file pepsico_document_agents-0.1.0.tar.gz.

File metadata

  • Download URL: pepsico_document_agents-0.1.0.tar.gz
  • Upload date:
  • Size: 23.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.6

File hashes

Hashes for pepsico_document_agents-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9acb65773c5ddfcd5ede93fb0ba153d108715920ad27dc4681fcab20ee40c4aa
MD5 fb1dcf868523043d37b3ed9b1316d0d5
BLAKE2b-256 4d56409aff120603926ddec805cadb2eee3c581997aa0b54a6ba179e1c873c56

See more details on using hashes here.

File details

Details for the file pepsico_document_agents-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pepsico_document_agents-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f4a9767fe5a0e9c78ae1ed2891e5dd42860050b5177d1caf0607b1074069cd0a
MD5 efce83928e936b258eae49e8e6fdbe59
BLAKE2b-256 b053607cc8f0dcefbcd4795ce70dbd880006c28ebb852fa0c64578766264f055

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page