Skip to main content

AI reliability framework with confidence extraction and outlier mining.

Project description

Agent Task Planning

CI Python 3.10+ License: MIT

Production-ready task planning for AI agents. From demo to deployment.

A framework-agnostic Python library that gives any LLM explicit planning capabilities with built-in guardrails, observability, and multi-provider support.

Developed by members of the PDA Task Force

Part of the PDA Platform.

Free to use under MIT license. Attribution appreciated.

Why Planning Matters

Most AI agent pilots fail not because the technology isn't smart enough, but because nobody asked: "Can we actually verify what it's doing?"

Without Planning With Planning
Agent attempts everything at once Agent creates explicit task breakdown
Failures are opaque Progress is visible and auditable
Context gets lost in long tasks State persists across steps
Difficult to interrupt or redirect Easy to pause, review, adjust
"Black box" decision-making Transparent reasoning trail

This library implements the To-Do List Planning pattern, which research shows is optimal for multi-step workflows requiring visibility and control.

Features

  • Multi-provider support: Claude, GPT-4, Gemini, Ollama (local models)
  • Production guardrails: Iteration limits, cost caps, timeouts, validation
  • Full observability: Structured logging, token tracking, state history
  • Framework-agnostic: Use standalone or integrate with LangChain, Temporal, etc.
  • Type-safe: Full type hints, Pydantic models, runtime validation
  • Confidence extraction: Self-consistency sampling for reliable structured data extraction from PM documents (see docs/confidence-extraction.md)
  • Outlier mining: Discover diverse approaches and novel insights by treating outliers as signal rather than noise (see docs/outlier-mining.md)

Quick Start

Installation

pip install agent-task-planning

Or install from source:

git clone https://github.com/PDATaskForce/agent-task-planning.git
cd agent-task-planning
pip install -e ".[all]"

Basic Usage

from agent_planning import TodoListPlanner
from agent_planning.providers import AnthropicProvider

# Initialise with your preferred provider
provider = AnthropicProvider(api_key="your-api-key")
planner = TodoListPlanner(provider=provider)

# Execute a complex task
result = await planner.execute(
    "Research the top 3 competitors in the UK sports broadcasting market, "
    "analyse their AI capabilities, and summarise findings"
)

# Access the execution trace
for task in result.tasks:
    print(f"[{task.status.value}] {task.content}")

With Guardrails

from agent_planning import TodoListPlanner, GuardrailConfig
from agent_planning.providers import OpenAIProvider

planner = TodoListPlanner(
    provider=OpenAIProvider(api_key="your-api-key"),
    guardrails=GuardrailConfig(
        max_tasks=15,
        max_iterations=50,
        max_cost_usd=1.00,
        timeout_seconds=300,
        require_approval_for=["delete", "send", "publish"]
    )
)

Using Local Models (Ollama)

from agent_planning.providers import OllamaProvider

provider = OllamaProvider(
    model="llama3.1:70b",
    base_url="http://localhost:11434"
)
planner = TodoListPlanner(provider=provider)

Command Line Demo

python scripts/demo.py "Research AI planning patterns and summarise"
python scripts/demo.py --provider ollama --model llama3.1:8b "List 3 benefits of exercise"

Confidence Extraction (New)

Extract reliable structured data from PM documents using self-consistency:

from agent_planning import ConfidenceExtractor, SchemaType
from agent_planning.providers import AnthropicProvider

provider = AnthropicProvider(api_key="your-key")
extractor = ConfidenceExtractor(provider)

result = await extractor.extract(
    query="What are the top 5 risks for this project?",
    context=project_document,
    schema=SchemaType.RISK,
)

print(f"Confidence: {result.confidence:.2%}")
print(f"Review needed: {result.review_level.value}")

Outlier Mining (New)

Discover diverse approaches by mining outliers as signal:

from agent_planning import OutlierMiner, MiningConfig
from agent_planning.providers import AnthropicProvider

provider = AnthropicProvider(api_key="your-key")
miner = OutlierMiner(provider, MiningConfig(samples=32))

result = await miner.mine(
    query="What non-obvious risks might affect this project?",
    context=project_document,
    schema=SchemaType.RISK,
)

print(f"Found {result.num_clusters} distinct approaches")
print(f"Diversity: {result.diversity_score:.2f}")

Architecture

graph TB
    subgraph "Your Application"
        A[User Request] --> B[TodoListPlanner]
    end

    subgraph "Planning Layer"
        B --> C[Task State Manager]
        B --> D[Guardrails]
        C --> E[(Task Store)]
    end

    subgraph "Provider Layer"
        B --> F{Provider}
        F --> G[Anthropic]
        F --> H[OpenAI]
        F --> I[Google]
        F --> J[Ollama]
    end

    subgraph "Observability"
        B --> K[Logger]
        B --> L[Metrics]
        K --> M[(Logs)]
        L --> N[(Metrics Store)]
    end

See docs/architecture.md for detailed diagrams and explanations.

Planning Patterns

This library focuses on To-Do List Planning, but it's important to understand when other patterns are more appropriate:

Pattern Best For This Library
No Planning Simple Q&A Not needed
ReAct Linear tool-using tasks Partial support
Chain-of-Thought Complex reasoning Use prompts only
Tree-of-Thought Exploratory/creative Not implemented
To-Do List Multi-step workflows ✅ Full support
HTN Complex dependencies Roadmap

See docs/when-to-use-what.md for a complete decision guide.

The Fundamental Trade-off

LLM-based planning is probabilistic:

Dimension Deterministic (Airflow, etc.) Probabilistic (This library)
Reproducibility Guaranteed Not guaranteed
Testing Exhaustive possible Statistical only
Flexibility Low High
Novel situations Cannot handle Can adapt
Certification Straightforward Challenging

When to use this library:

  • Research and exploration tasks
  • Creative work requiring adaptation
  • Internal productivity tools
  • Prototyping agent workflows

When to use deterministic orchestration instead:

  • Regulated industries requiring audit trails
  • Financial transactions
  • Safety-critical operations
  • High-volume processing where consistency matters

For safety-critical applications, consider the hybrid approach combining deterministic orchestration with probabilistic subtasks.

Documentation

Core Features

PM Data Extraction

Prompt Templates

If you just want the prompts without the library:

Examples

Task Planning

Example Description
01_basic_usage.py Simple task execution
02_multi_provider.py Switching between Claude, GPT-4, Gemini
03_with_guardrails.py Production configuration
04_temporal_hybrid.py Deterministic orchestration pattern

Confidence Extraction

Example Description
05_basic_confidence.py Simple confidence extraction
06_pm_extraction.py Multiple PM schema types
07_batch_confidence.py Batch processing with concurrency
08_custom_schema.py Custom schema definition

Outlier Mining

Example Description
09_basic_mining.py Mining for diverse approaches
10_risk_mining.py Mining for non-obvious risks

Supporting Research

This library implements patterns validated by recent research:

  • "Adaptation of Agentic AI" (Stanford, Harvard, UC Berkeley, Caltech, Dec 2024) identifies unreliable tool use, weak long-horizon planning, and poor generalisation as core failure modes. Structured planning directly addresses these. arxiv.org/abs/2512.16301

Coming Soon: ARMM

The Agent Reliability Maturity Model (ARMM) is a comprehensive framework for assessing organisational readiness for agent deployment. ARMM provides specific requirements across four dimensions:

  • Technical Controls
  • Operational Processes
  • Governance Framework
  • Organisational Capability

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

This project is maintained by the PDA Task Force. Issues and pull requests are reviewed by community maintainers.

Acknowledgements

The confidence extraction and outlier mining capabilities were shaped by feature suggestions from Lawrence Rowland.

License

MIT License. Free to use, modify, and distribute with attribution.

See LICENSE for details.

Attribution

Developed by: Members of the PDA Task Force

Maintained by: PDA Task Force — Advancing best practices in project data analytics and AI deployment.


If this library helps you, consider giving it a ⭐ on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_task_planning-0.2.0.tar.gz (68.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_task_planning-0.2.0-py3-none-any.whl (50.7 kB view details)

Uploaded Python 3

File details

Details for the file agent_task_planning-0.2.0.tar.gz.

File metadata

  • Download URL: agent_task_planning-0.2.0.tar.gz
  • Upload date:
  • Size: 68.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.6

File hashes

Hashes for agent_task_planning-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1fbb6add63db62464448767a58bd01ecaf4bed0b6d3dcaa3335b7ac95f91eadd
MD5 622756b46e76c7de736adf2eb4eca1a5
BLAKE2b-256 473f9f58c1a5d8fe4a110aa015e4c63a22c583f3b0ca9178025558df3f028763

See more details on using hashes here.

File details

Details for the file agent_task_planning-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agent_task_planning-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a453f3c4afec68b15056006f7d46bfd36134de3cc7df617bb8016c679e9d7fe1
MD5 a1c99e5f8a2d7008de384a18ebe9da10
BLAKE2b-256 cd6ccdad90b20cb2b648530e9d613d2828dad2360b61acaa67fd321d27dcc822

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page