AI reliability framework with confidence extraction and outlier mining.
Project description
Agent Task Planning
Production-ready task planning for AI agents. From demo to deployment.
A framework-agnostic Python library that gives any LLM explicit planning capabilities with built-in guardrails, observability, and multi-provider support.
Developed by members of the PDA Task Force
Part of the PDA Platform.
Free to use under MIT license. Attribution appreciated.
Why Planning Matters
Most AI agent pilots fail not because the technology isn't smart enough, but because nobody asked: "Can we actually verify what it's doing?"
| Without Planning | With Planning |
|---|---|
| Agent attempts everything at once | Agent creates explicit task breakdown |
| Failures are opaque | Progress is visible and auditable |
| Context gets lost in long tasks | State persists across steps |
| Difficult to interrupt or redirect | Easy to pause, review, adjust |
| "Black box" decision-making | Transparent reasoning trail |
This library implements the To-Do List Planning pattern, which research shows is optimal for multi-step workflows requiring visibility and control.
Features
- Multi-provider support: Claude, GPT-4, Gemini, Ollama (local models)
- Production guardrails: Iteration limits, cost caps, timeouts, validation
- Full observability: Structured logging, token tracking, state history
- Framework-agnostic: Use standalone or integrate with LangChain, Temporal, etc.
- Type-safe: Full type hints, Pydantic models, runtime validation
- Confidence extraction: Self-consistency sampling for reliable structured data extraction from PM documents (see docs/confidence-extraction.md)
- Outlier mining: Discover diverse approaches and novel insights by treating outliers as signal rather than noise (see docs/outlier-mining.md)
Quick Start
Installation
pip install agent-task-planning
Or install from source:
git clone https://github.com/PDATaskForce/agent-task-planning.git
cd agent-task-planning
pip install -e ".[all]"
Basic Usage
from agent_planning import TodoListPlanner
from agent_planning.providers import AnthropicProvider
# Initialise with your preferred provider
provider = AnthropicProvider(api_key="your-api-key")
planner = TodoListPlanner(provider=provider)
# Execute a complex task
result = await planner.execute(
"Research the top 3 competitors in the UK sports broadcasting market, "
"analyse their AI capabilities, and summarise findings"
)
# Access the execution trace
for task in result.tasks:
print(f"[{task.status.value}] {task.content}")
With Guardrails
from agent_planning import TodoListPlanner, GuardrailConfig
from agent_planning.providers import OpenAIProvider
planner = TodoListPlanner(
provider=OpenAIProvider(api_key="your-api-key"),
guardrails=GuardrailConfig(
max_tasks=15,
max_iterations=50,
max_cost_usd=1.00,
timeout_seconds=300,
require_approval_for=["delete", "send", "publish"]
)
)
Using Local Models (Ollama)
from agent_planning.providers import OllamaProvider
provider = OllamaProvider(
model="llama3.1:70b",
base_url="http://localhost:11434"
)
planner = TodoListPlanner(provider=provider)
Command Line Demo
python scripts/demo.py "Research AI planning patterns and summarise"
python scripts/demo.py --provider ollama --model llama3.1:8b "List 3 benefits of exercise"
Confidence Extraction (New)
Extract reliable structured data from PM documents using self-consistency:
from agent_planning import ConfidenceExtractor, SchemaType
from agent_planning.providers import AnthropicProvider
provider = AnthropicProvider(api_key="your-key")
extractor = ConfidenceExtractor(provider)
result = await extractor.extract(
query="What are the top 5 risks for this project?",
context=project_document,
schema=SchemaType.RISK,
)
print(f"Confidence: {result.confidence:.2%}")
print(f"Review needed: {result.review_level.value}")
Outlier Mining (New)
Discover diverse approaches by mining outliers as signal:
from agent_planning import OutlierMiner, MiningConfig
from agent_planning.providers import AnthropicProvider
provider = AnthropicProvider(api_key="your-key")
miner = OutlierMiner(provider, MiningConfig(samples=32))
result = await miner.mine(
query="What non-obvious risks might affect this project?",
context=project_document,
schema=SchemaType.RISK,
)
print(f"Found {result.num_clusters} distinct approaches")
print(f"Diversity: {result.diversity_score:.2f}")
Architecture
graph TB
subgraph "Your Application"
A[User Request] --> B[TodoListPlanner]
end
subgraph "Planning Layer"
B --> C[Task State Manager]
B --> D[Guardrails]
C --> E[(Task Store)]
end
subgraph "Provider Layer"
B --> F{Provider}
F --> G[Anthropic]
F --> H[OpenAI]
F --> I[Google]
F --> J[Ollama]
end
subgraph "Observability"
B --> K[Logger]
B --> L[Metrics]
K --> M[(Logs)]
L --> N[(Metrics Store)]
end
See docs/architecture.md for detailed diagrams and explanations.
Planning Patterns
This library focuses on To-Do List Planning, but it's important to understand when other patterns are more appropriate:
| Pattern | Best For | This Library |
|---|---|---|
| No Planning | Simple Q&A | Not needed |
| ReAct | Linear tool-using tasks | Partial support |
| Chain-of-Thought | Complex reasoning | Use prompts only |
| Tree-of-Thought | Exploratory/creative | Not implemented |
| To-Do List | Multi-step workflows | ✅ Full support |
| HTN | Complex dependencies | Roadmap |
See docs/when-to-use-what.md for a complete decision guide.
The Fundamental Trade-off
LLM-based planning is probabilistic:
| Dimension | Deterministic (Airflow, etc.) | Probabilistic (This library) |
|---|---|---|
| Reproducibility | Guaranteed | Not guaranteed |
| Testing | Exhaustive possible | Statistical only |
| Flexibility | Low | High |
| Novel situations | Cannot handle | Can adapt |
| Certification | Straightforward | Challenging |
When to use this library:
- Research and exploration tasks
- Creative work requiring adaptation
- Internal productivity tools
- Prototyping agent workflows
When to use deterministic orchestration instead:
- Regulated industries requiring audit trails
- Financial transactions
- Safety-critical operations
- High-volume processing where consistency matters
For safety-critical applications, consider the hybrid approach combining deterministic orchestration with probabilistic subtasks.
Documentation
Core Features
PM Data Extraction
- Confidence Extraction - Technical documentation for reliable structured data extraction
- Confidence for Practitioners - Non-technical guide for PM professionals
- Outlier Mining - Discover diverse approaches and novel insights
Prompt Templates
If you just want the prompts without the library:
Examples
Task Planning
| Example | Description |
|---|---|
| 01_basic_usage.py | Simple task execution |
| 02_multi_provider.py | Switching between Claude, GPT-4, Gemini |
| 03_with_guardrails.py | Production configuration |
| 04_temporal_hybrid.py | Deterministic orchestration pattern |
Confidence Extraction
| Example | Description |
|---|---|
| 05_basic_confidence.py | Simple confidence extraction |
| 06_pm_extraction.py | Multiple PM schema types |
| 07_batch_confidence.py | Batch processing with concurrency |
| 08_custom_schema.py | Custom schema definition |
Outlier Mining
| Example | Description |
|---|---|
| 09_basic_mining.py | Mining for diverse approaches |
| 10_risk_mining.py | Mining for non-obvious risks |
Supporting Research
This library implements patterns validated by recent research:
- "Adaptation of Agentic AI" (Stanford, Harvard, UC Berkeley, Caltech, Dec 2024) identifies unreliable tool use, weak long-horizon planning, and poor generalisation as core failure modes. Structured planning directly addresses these. arxiv.org/abs/2512.16301
Coming Soon: ARMM
The Agent Reliability Maturity Model (ARMM) is a comprehensive framework for assessing organisational readiness for agent deployment. ARMM provides specific requirements across four dimensions:
- Technical Controls
- Operational Processes
- Governance Framework
- Organisational Capability
Contributing
We welcome contributions! See CONTRIBUTING.md for guidelines.
This project is maintained by the PDA Task Force. Issues and pull requests are reviewed by community maintainers.
Acknowledgements
The confidence extraction and outlier mining capabilities were shaped by feature suggestions from Lawrence Rowland.
License
MIT License. Free to use, modify, and distribute with attribution.
See LICENSE for details.
Attribution
Developed by: Members of the PDA Task Force
Maintained by: PDA Task Force — Advancing best practices in project data analytics and AI deployment.
If this library helps you, consider giving it a ⭐ on GitHub.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_task_planning-0.2.0.tar.gz.
File metadata
- Download URL: agent_task_planning-0.2.0.tar.gz
- Upload date:
- Size: 68.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fbb6add63db62464448767a58bd01ecaf4bed0b6d3dcaa3335b7ac95f91eadd
|
|
| MD5 |
622756b46e76c7de736adf2eb4eca1a5
|
|
| BLAKE2b-256 |
473f9f58c1a5d8fe4a110aa015e4c63a22c583f3b0ca9178025558df3f028763
|
File details
Details for the file agent_task_planning-0.2.0-py3-none-any.whl.
File metadata
- Download URL: agent_task_planning-0.2.0-py3-none-any.whl
- Upload date:
- Size: 50.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a453f3c4afec68b15056006f7d46bfd36134de3cc7df617bb8016c679e9d7fe1
|
|
| MD5 |
a1c99e5f8a2d7008de384a18ebe9da10
|
|
| BLAKE2b-256 |
cd6ccdad90b20cb2b648530e9d613d2828dad2360b61acaa67fd321d27dcc822
|