Agent-based extraction framework for transforming parsed documents into structured planogram data
Project description
document-agents
A modular agent-based extraction framework for transforming parsed document content into structured planogram data. This library provides a production-ready architecture for building multi-step extraction pipelines with LLM-powered agents.
Overview
document-agents implements a modular agent framework where each agent performs a single extraction or transformation task. Agents communicate through a shared mutable context and are orchestrated through configurable processing chains. The library is designed for enterprise-scale document processing workloads.
Architecture
The library follows SOLID principles with:
- Single Responsibility Principle - Each agent performs exactly one extraction or transformation task
- Adapter Pattern - Normalizes provider responses to shared models
- Factory Pattern - Creates agent instances by name
- Dependency Injection - LLM clients and configurations injected into agents
- Async-first - Non-blocking operations for high throughput
- Protocol-based Interfaces - Type-safe contracts for agents and LLM clients
Installation
pip install document-agents
Optional Dependencies
# For Anthropic provider
pip install document-agents[anthropic]
# For Google provider
pip install document-agents[google]
# For development
pip install document-agents[dev]
Quick Start
import asyncio
from document_agents import (
AgentChain,
AgentConfig,
AgentContext,
LLMClient,
LLMConfig,
ProcessingMode,
ShelfReconstructionAgent,
ProductExtractionAgent,
SchemaMappingAgent,
)
async def main():
# Configure LLM client
llm_config = LLMConfig(
provider="openai",
model="gpt-4",
api_key="your-api-key",
)
llm = LLMClient(llm_config)
# Configure agents
agent_config = AgentConfig(
retries=3,
timeout_seconds=120.0,
)
# Create agents
agents = {
"shelf_reconstruction": ShelfReconstructionAgent(llm, agent_config),
"product_extraction": ProductExtractionAgent(llm, agent_config),
"schema_mapping": SchemaMappingAgent(llm, agent_config),
}
# Create chain
chain = AgentChain(agents)
# Create context with parsed document data
context = AgentContext(document_id="doc-001")
context.parse_results = [...] # Your parsed document results
# Execute chain
await chain.execute(context, ProcessingMode.STANDARD)
# Get final result
planogram = context.final_result
print(f"Extracted {planogram.total_products} products across {planogram.total_shelves} shelves")
asyncio.run(main())
Configuration
LLM Configuration
from document_agents import LLMConfig
llm_config = LLMConfig(
provider="openai", # openai, anthropic, google, llamaapi
model="gpt-4",
api_key="your-api-key",
temperature=0.0,
max_tokens=4096,
timeout_seconds=120.0,
max_retries=3,
base_url=None, # Optional custom base URL
)
Agent Configuration
from document_agents import AgentConfig
agent_config = AgentConfig(
retries=3,
timeout_seconds=120.0,
prompt_version="v1",
enable_logging=True,
enable_telemetry=True,
log_prompts=False,
log_responses=False,
)
Processing Modes
The library supports three processing modes for different complexity levels:
SIMPLE Mode
For simple documents with minimal structure:
- Product Extraction
- Schema Mapping
STANDARD Mode
For typical planogram documents:
- Shelf Reconstruction
- Product Extraction
- Schema Mapping
COMPLEX Mode
For complex multi-page documents with conflicts:
- Shelf Reconstruction
- Product Extraction
- Cross Reference
- Conflict Resolution
- Schema Mapping
Agents
Shelf Reconstruction Agent
Identifies shelves, determines numbering, detects bays/sections, and detects shelf continuations across pages.
from document_agents import ShelfReconstructionAgent
agent = ShelfReconstructionAgent(llm, agent_config)
Product Extraction Agent
Extracts product names, UPCs, facings, shelf assignments, and position information from tables, markdown, and inline references.
from document_agents import ProductExtractionAgent
agent = ProductExtractionAgent(llm, agent_config)
Cross Reference Agent
Identifies duplicate products, UPC lookups, table linkage, and page reconciliation.
from document_agents import CrossReferenceAgent
agent = CrossReferenceAgent(llm, agent_config)
Conflict Resolution Agent
Detects conflicting values, resolves conflicts, and preserves audit trails.
from document_agents import ConflictResolutionAgent
agent = ConflictResolutionAgent(llm, agent_config)
Schema Mapping Agent
Maps extracted data to structured planogram schema with proper hierarchy.
from document_agents import SchemaMappingAgent, ProcessingMode
agent = SchemaMappingAgent(llm, agent_config, ProcessingMode.STANDARD)
Agent Chain
The AgentChain orchestrates agent execution in configurable sequences:
from document_agents import AgentChain, ProcessingMode
chain = AgentChain(agents)
# Execute with specific processing mode
await chain.execute(context, ProcessingMode.STANDARD)
# Get chain definition
simple_chain = chain.get_chain(ProcessingMode.SIMPLE)
Custom Agents
Create custom agents by extending BaseAgent:
from document_agents import BaseAgent, AgentContext, AgentConfig, LLMClient
class CustomAgent(BaseAgent):
def __init__(self, llm: LLMClient, config: AgentConfig):
super().__init__(
llm=llm,
prompt_template="custom.txt",
config=config,
agent_name="custom_agent",
)
def _output_schema(self):
return {
"type": "object",
"properties": {
"result": {"type": "string"}
}
}
def _update_context(self, context: AgentContext, parsed):
context.metadata["custom_result"] = parsed
Error Handling
The library provides a comprehensive exception hierarchy:
from document_agents import (
DocumentAgentsError,
AgentExecutionError,
LLMError,
PromptRenderingError,
SchemaValidationError,
)
try:
await chain.execute(context)
except AgentExecutionError as e:
print(f"Agent {e.agent_name} failed: {e.message}")
print(f"Execution time: {e.metadata['execution_time_ms']}ms")
except LLMError as e:
print(f"LLM error: {e.message}")
print(f"Provider: {e.metadata['provider']}")
Telemetry
Agent execution is logged with detailed telemetry:
# Get total execution time
total_time = context.get_total_execution_time_ms()
# Get total token usage
input_tokens, output_tokens = context.get_total_tokens()
# Access individual agent logs
for log in context.agent_logs:
print(f"{log.agent_name}: {log.execution_time_ms}ms")
print(f" Input tokens: {log.input_tokens}")
print(f" Output tokens: {log.output_tokens}")
LLM Providers
Supported Providers
- OpenAI - GPT-4, GPT-3.5 Turbo
- Anthropic - Claude 3 Opus, Claude 3 Sonnet
- Google - Gemini Pro
- LlamaAPI - OpenAI-compatible Llama models
Provider Configuration
# OpenAI
llm_config = LLMConfig(
provider="openai",
model="gpt-4",
api_key="sk-...",
)
# Anthropic
llm_config = LLMConfig(
provider="anthropic",
model="claude-3-opus-20240229",
api_key="sk-ant-...",
)
# Google
llm_config = LLMConfig(
provider="google",
model="gemini-pro",
api_key="AI...",
)
# LlamaAPI
llm_config = LLMConfig(
provider="llamaapi",
model="llama-2-70b-chat",
api_key="your-api-key",
)
Development
Running Tests
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run with coverage
pytest --cov=document_agents
Code Style
# Format code
black document_agents
# Lint code
ruff check document_agents
# Type check
mypy document_agents
Design Principles
- Single Responsibility - Each agent performs one task
- Dependency Injection - All dependencies injected via constructors
- Protocol-based - Type-safe interfaces using Protocol
- Async-first - All operations are async for performance
- Type-safe - Full type hints with Pydantic validation
- Extensible - Easy to add new agents and providers
- Testable - Mock-friendly design with clear interfaces
Dependencies
document-core- Shared interfaces and modelsopenai>=1.30- OpenAI API clienttiktoken>=0.7- Token countingjinja2>=3.1- Prompt templatingpydantic>=2.0- Data validation
Optional
anthropic>=0.18- Anthropic API clientgoogle-generativeai>=0.3- Google API client
License
MIT
Support
For issues, questions, or contributions, please visit the project repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pepsico_document_agents-0.1.1.tar.gz.
File metadata
- Download URL: pepsico_document_agents-0.1.1.tar.gz
- Upload date:
- Size: 23.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5bde9473f562d6b9f6196689babb834670810106342e90ecbcfc970d13f3a8a
|
|
| MD5 |
00b251c167ce5b81cab9d57d588868a7
|
|
| BLAKE2b-256 |
da919d7a936664b1133cc365d886b2d04d454d50caf489820268c6c30cc84d00
|
File details
Details for the file pepsico_document_agents-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pepsico_document_agents-0.1.1-py3-none-any.whl
- Upload date:
- Size: 23.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e35f164498633438f06c8cf4f572ee4a5b12df2d917d8cc0116796cea1a95c35
|
|
| MD5 |
64fc3850b5cc18e144ec972e598511d0
|
|
| BLAKE2b-256 |
98f6e62aed348d948aaabca121d4c0a559df8f4a516e019e246c673e70b08950
|