Skip to main content

Agent-based extraction framework for transforming parsed documents into structured planogram data

Project description

document-agents

A modular agent-based extraction framework for transforming parsed document content into structured planogram data. This library provides a production-ready architecture for building multi-step extraction pipelines with LLM-powered agents.

Overview

document-agents implements a modular agent framework where each agent performs a single extraction or transformation task. Agents communicate through a shared mutable context and are orchestrated through configurable processing chains. The library is designed for enterprise-scale document processing workloads.

Architecture

The library follows SOLID principles with:

  • Single Responsibility Principle - Each agent performs exactly one extraction or transformation task
  • Adapter Pattern - Normalizes provider responses to shared models
  • Factory Pattern - Creates agent instances by name
  • Dependency Injection - LLM clients and configurations injected into agents
  • Async-first - Non-blocking operations for high throughput
  • Protocol-based Interfaces - Type-safe contracts for agents and LLM clients

Installation

pip install document-agents

Optional Dependencies

# For Anthropic provider
pip install document-agents[anthropic]

# For Google provider
pip install document-agents[google]

# For development
pip install document-agents[dev]

Quick Start

import asyncio
from document_agents import (
    AgentChain,
    AgentConfig,
    AgentContext,
    LLMClient,
    LLMConfig,
    ProcessingMode,
    ShelfReconstructionAgent,
    ProductExtractionAgent,
    SchemaMappingAgent,
)

async def main():
    # Configure LLM client
    llm_config = LLMConfig(
        provider="openai",
        model="gpt-4",
        api_key="your-api-key",
    )
    llm = LLMClient(llm_config)
    
    # Configure agents
    agent_config = AgentConfig(
        retries=3,
        timeout_seconds=120.0,
    )
    
    # Create agents
    agents = {
        "shelf_reconstruction": ShelfReconstructionAgent(llm, agent_config),
        "product_extraction": ProductExtractionAgent(llm, agent_config),
        "schema_mapping": SchemaMappingAgent(llm, agent_config),
    }
    
    # Create chain
    chain = AgentChain(agents)
    
    # Create context with parsed document data
    context = AgentContext(document_id="doc-001")
    context.parse_results = [...]  # Your parsed document results
    
    # Execute chain
    await chain.execute(context, ProcessingMode.STANDARD)
    
    # Get final result
    planogram = context.final_result
    print(f"Extracted {planogram.total_products} products across {planogram.total_shelves} shelves")

asyncio.run(main())

Configuration

LLM Configuration

from document_agents import LLMConfig

llm_config = LLMConfig(
    provider="openai",  # openai, anthropic, google, llamaapi
    model="gpt-4",
    api_key="your-api-key",
    temperature=0.0,
    max_tokens=4096,
    timeout_seconds=120.0,
    max_retries=3,
    base_url=None,  # Optional custom base URL
)

Agent Configuration

from document_agents import AgentConfig

agent_config = AgentConfig(
    retries=3,
    timeout_seconds=120.0,
    prompt_version="v1",
    enable_logging=True,
    enable_telemetry=True,
    log_prompts=False,
    log_responses=False,
)

Processing Modes

The library supports three processing modes for different complexity levels:

SIMPLE Mode

For simple documents with minimal structure:

  • Product Extraction
  • Schema Mapping

STANDARD Mode

For typical planogram documents:

  • Shelf Reconstruction
  • Product Extraction
  • Schema Mapping

COMPLEX Mode

For complex multi-page documents with conflicts:

  • Shelf Reconstruction
  • Product Extraction
  • Cross Reference
  • Conflict Resolution
  • Schema Mapping

Agents

Shelf Reconstruction Agent

Identifies shelves, determines numbering, detects bays/sections, and detects shelf continuations across pages.

from document_agents import ShelfReconstructionAgent

agent = ShelfReconstructionAgent(llm, agent_config)

Product Extraction Agent

Extracts product names, UPCs, facings, shelf assignments, and position information from tables, markdown, and inline references.

from document_agents import ProductExtractionAgent

agent = ProductExtractionAgent(llm, agent_config)

Cross Reference Agent

Identifies duplicate products, UPC lookups, table linkage, and page reconciliation.

from document_agents import CrossReferenceAgent

agent = CrossReferenceAgent(llm, agent_config)

Conflict Resolution Agent

Detects conflicting values, resolves conflicts, and preserves audit trails.

from document_agents import ConflictResolutionAgent

agent = ConflictResolutionAgent(llm, agent_config)

Schema Mapping Agent

Maps extracted data to structured planogram schema with proper hierarchy.

from document_agents import SchemaMappingAgent, ProcessingMode

agent = SchemaMappingAgent(llm, agent_config, ProcessingMode.STANDARD)

Agent Chain

The AgentChain orchestrates agent execution in configurable sequences:

from document_agents import AgentChain, ProcessingMode

chain = AgentChain(agents)

# Execute with specific processing mode
await chain.execute(context, ProcessingMode.STANDARD)

# Get chain definition
simple_chain = chain.get_chain(ProcessingMode.SIMPLE)

Custom Agents

Create custom agents by extending BaseAgent:

from document_agents import BaseAgent, AgentContext, AgentConfig, LLMClient

class CustomAgent(BaseAgent):
    def __init__(self, llm: LLMClient, config: AgentConfig):
        super().__init__(
            llm=llm,
            prompt_template="custom.txt",
            config=config,
            agent_name="custom_agent",
        )
    
    def _output_schema(self):
        return {
            "type": "object",
            "properties": {
                "result": {"type": "string"}
            }
        }
    
    def _update_context(self, context: AgentContext, parsed):
        context.metadata["custom_result"] = parsed

Error Handling

The library provides a comprehensive exception hierarchy:

from document_agents import (
    DocumentAgentsError,
    AgentExecutionError,
    LLMError,
    PromptRenderingError,
    SchemaValidationError,
)

try:
    await chain.execute(context)
except AgentExecutionError as e:
    print(f"Agent {e.agent_name} failed: {e.message}")
    print(f"Execution time: {e.metadata['execution_time_ms']}ms")
except LLMError as e:
    print(f"LLM error: {e.message}")
    print(f"Provider: {e.metadata['provider']}")

Telemetry

Agent execution is logged with detailed telemetry:

# Get total execution time
total_time = context.get_total_execution_time_ms()

# Get total token usage
input_tokens, output_tokens = context.get_total_tokens()

# Access individual agent logs
for log in context.agent_logs:
    print(f"{log.agent_name}: {log.execution_time_ms}ms")
    print(f"  Input tokens: {log.input_tokens}")
    print(f"  Output tokens: {log.output_tokens}")

LLM Providers

Supported Providers

  • OpenAI - GPT-4, GPT-3.5 Turbo
  • Anthropic - Claude 3 Opus, Claude 3 Sonnet
  • Google - Gemini Pro
  • LlamaAPI - OpenAI-compatible Llama models

Provider Configuration

# OpenAI
llm_config = LLMConfig(
    provider="openai",
    model="gpt-4",
    api_key="sk-...",
)

# Anthropic
llm_config = LLMConfig(
    provider="anthropic",
    model="claude-3-opus-20240229",
    api_key="sk-ant-...",
)

# Google
llm_config = LLMConfig(
    provider="google",
    model="gemini-pro",
    api_key="AI...",
)

# LlamaAPI
llm_config = LLMConfig(
    provider="llamaapi",
    model="llama-2-70b-chat",
    api_key="your-api-key",
)

Development

Running Tests

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=document_agents

Code Style

# Format code
black document_agents

# Lint code
ruff check document_agents

# Type check
mypy document_agents

Design Principles

  1. Single Responsibility - Each agent performs one task
  2. Dependency Injection - All dependencies injected via constructors
  3. Protocol-based - Type-safe interfaces using Protocol
  4. Async-first - All operations are async for performance
  5. Type-safe - Full type hints with Pydantic validation
  6. Extensible - Easy to add new agents and providers
  7. Testable - Mock-friendly design with clear interfaces

Dependencies

  • document-core - Shared interfaces and models
  • openai>=1.30 - OpenAI API client
  • tiktoken>=0.7 - Token counting
  • jinja2>=3.1 - Prompt templating
  • pydantic>=2.0 - Data validation

Optional

  • anthropic>=0.18 - Anthropic API client
  • google-generativeai>=0.3 - Google API client

License

MIT

Support

For issues, questions, or contributions, please visit the project repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pepsico_document_agents-0.1.1.tar.gz (23.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pepsico_document_agents-0.1.1-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file pepsico_document_agents-0.1.1.tar.gz.

File metadata

  • Download URL: pepsico_document_agents-0.1.1.tar.gz
  • Upload date:
  • Size: 23.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.6

File hashes

Hashes for pepsico_document_agents-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e5bde9473f562d6b9f6196689babb834670810106342e90ecbcfc970d13f3a8a
MD5 00b251c167ce5b81cab9d57d588868a7
BLAKE2b-256 da919d7a936664b1133cc365d886b2d04d454d50caf489820268c6c30cc84d00

See more details on using hashes here.

File details

Details for the file pepsico_document_agents-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pepsico_document_agents-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e35f164498633438f06c8cf4f572ee4a5b12df2d917d8cc0116796cea1a95c35
MD5 64fc3850b5cc18e144ec972e598511d0
BLAKE2b-256 98f6e62aed348d948aaabca121d4c0a559df8f4a516e019e246c673e70b08950

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page