AI reliability framework with confidence extraction and outlier mining.

These details have not been verified by PyPI

Project links

Project description

Agent Task Planning

Production-ready task planning for AI agents. From demo to deployment.

A framework-agnostic Python library that gives any LLM explicit planning capabilities with built-in guardrails, observability, and multi-provider support.

Developed by members of the PDA Task Force

Part of the PDA Platform.

Free to use under MIT license. Attribution appreciated.

Why Planning Matters

Most AI agent pilots fail not because the technology isn't smart enough, but because nobody asked: "Can we actually verify what it's doing?"

Without Planning	With Planning
Agent attempts everything at once	Agent creates explicit task breakdown
Failures are opaque	Progress is visible and auditable
Context gets lost in long tasks	State persists across steps
Difficult to interrupt or redirect	Easy to pause, review, adjust
"Black box" decision-making	Transparent reasoning trail

This library implements the To-Do List Planning pattern, which research shows is optimal for multi-step workflows requiring visibility and control.

Features

Multi-provider support: Claude, GPT-4, Gemini, Ollama (local models)
Production guardrails: Iteration limits, cost caps, timeouts, validation
Full observability: Structured logging, token tracking, state history
Framework-agnostic: Use standalone or integrate with LangChain, Temporal, etc.
Type-safe: Full type hints, Pydantic models, runtime validation
Confidence extraction: Self-consistency sampling for reliable structured data extraction from PM documents (see docs/confidence-extraction.md)
Outlier mining: Discover diverse approaches and novel insights by treating outliers as signal rather than noise (see docs/outlier-mining.md)

Quick Start

Installation

pip install agent-task-planning

Or install from source:

git clone https://github.com/PDATaskForce/agent-task-planning.git
cd agent-task-planning
pip install -e ".[all]"

Basic Usage

from agent_planning import TodoListPlanner
from agent_planning.providers import AnthropicProvider

# Initialise with your preferred provider
provider = AnthropicProvider(api_key="your-api-key")
planner = TodoListPlanner(provider=provider)

# Execute a complex task
result = await planner.execute(
    "Research the top 3 competitors in the UK sports broadcasting market, "
    "analyse their AI capabilities, and summarise findings"
)

# Access the execution trace
for task in result.tasks:
    print(f"[{task.status.value}] {task.content}")

With Guardrails

from agent_planning import TodoListPlanner, GuardrailConfig
from agent_planning.providers import OpenAIProvider

planner = TodoListPlanner(
    provider=OpenAIProvider(api_key="your-api-key"),
    guardrails=GuardrailConfig(
        max_tasks=15,
        max_iterations=50,
        max_cost_usd=1.00,
        timeout_seconds=300,
        require_approval_for=["delete", "send", "publish"]
    )
)

Using Local Models (Ollama)

from agent_planning.providers import OllamaProvider

provider = OllamaProvider(
    model="llama3.1:70b",
    base_url="http://localhost:11434"
)
planner = TodoListPlanner(provider=provider)

Command Line Demo

python scripts/demo.py "Research AI planning patterns and summarise"
python scripts/demo.py --provider ollama --model llama3.1:8b "List 3 benefits of exercise"

Confidence Extraction (New)

Extract reliable structured data from PM documents using self-consistency:

from agent_planning import ConfidenceExtractor, SchemaType
from agent_planning.providers import AnthropicProvider

provider = AnthropicProvider(api_key="your-key")
extractor = ConfidenceExtractor(provider)

result = await extractor.extract(
    query="What are the top 5 risks for this project?",
    context=project_document,
    schema=SchemaType.RISK,
)

print(f"Confidence: {result.confidence:.2%}")
print(f"Review needed: {result.review_level.value}")

Outlier Mining (New)

Discover diverse approaches by mining outliers as signal:

from agent_planning import OutlierMiner, MiningConfig
from agent_planning.providers import AnthropicProvider

provider = AnthropicProvider(api_key="your-key")
miner = OutlierMiner(provider, MiningConfig(samples=32))

result = await miner.mine(
    query="What non-obvious risks might affect this project?",
    context=project_document,
    schema=SchemaType.RISK,
)

print(f"Found {result.num_clusters} distinct approaches")
print(f"Diversity: {result.diversity_score:.2f}")

Architecture

graph TB
    subgraph "Your Application"
        A[User Request] --> B[TodoListPlanner]
    end

    subgraph "Planning Layer"
        B --> C[Task State Manager]
        B --> D[Guardrails]
        C --> E[(Task Store)]
    end

    subgraph "Provider Layer"
        B --> F{Provider}
        F --> G[Anthropic]
        F --> H[OpenAI]
        F --> I[Google]
        F --> J[Ollama]
    end

    subgraph "Observability"
        B --> K[Logger]
        B --> L[Metrics]
        K --> M[(Logs)]
        L --> N[(Metrics Store)]
    end

See docs/architecture.md for detailed diagrams and explanations.

Planning Patterns

This library focuses on To-Do List Planning, but it's important to understand when other patterns are more appropriate:

Pattern	Best For	This Library
No Planning	Simple Q&A	Not needed
ReAct	Linear tool-using tasks	Partial support
Chain-of-Thought	Complex reasoning	Use prompts only
Tree-of-Thought	Exploratory/creative	Not implemented
To-Do List	Multi-step workflows	✅ Full support
HTN	Complex dependencies	Roadmap

See docs/when-to-use-what.md for a complete decision guide.

The Fundamental Trade-off

LLM-based planning is probabilistic:

Dimension	Deterministic (Airflow, etc.)	Probabilistic (This library)
Reproducibility	Guaranteed	Not guaranteed
Testing	Exhaustive possible	Statistical only
Flexibility	Low	High
Novel situations	Cannot handle	Can adapt
Certification	Straightforward	Challenging

When to use this library:

Research and exploration tasks
Creative work requiring adaptation
Internal productivity tools
Prototyping agent workflows

When to use deterministic orchestration instead:

Regulated industries requiring audit trails
Financial transactions
Safety-critical operations
High-volume processing where consistency matters

For safety-critical applications, consider the hybrid approach combining deterministic orchestration with probabilistic subtasks.

Documentation

Core Features

PM Data Extraction

Confidence Extraction - Technical documentation for reliable structured data extraction
Confidence for Practitioners - Non-technical guide for PM professionals
Outlier Mining - Discover diverse approaches and novel insights

Prompt Templates

If you just want the prompts without the library:

Examples

Task Planning

Example	Description
01_basic_usage.py	Simple task execution
02_multi_provider.py	Switching between Claude, GPT-4, Gemini
03_with_guardrails.py	Production configuration
04_temporal_hybrid.py	Deterministic orchestration pattern

Confidence Extraction

Example	Description
05_basic_confidence.py	Simple confidence extraction
06_pm_extraction.py	Multiple PM schema types
07_batch_confidence.py	Batch processing with concurrency
08_custom_schema.py	Custom schema definition

Outlier Mining

Example	Description
09_basic_mining.py	Mining for diverse approaches
10_risk_mining.py	Mining for non-obvious risks

Supporting Research

This library implements patterns validated by recent research:

"Adaptation of Agentic AI" (Stanford, Harvard, UC Berkeley, Caltech, Dec 2024) identifies unreliable tool use, weak long-horizon planning, and poor generalisation as core failure modes. Structured planning directly addresses these. arxiv.org/abs/2512.16301

Coming Soon: ARMM

The Agent Reliability Maturity Model (ARMM) is a comprehensive framework for assessing organisational readiness for agent deployment. ARMM provides specific requirements across four dimensions:

Technical Controls
Operational Processes
Governance Framework
Organisational Capability

Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

This project is maintained by the PDA Task Force. Issues and pull requests are reviewed by community maintainers.

Acknowledgements

The confidence extraction and outlier mining capabilities were shaped by feature suggestions from Lawrence Rowland.

License

MIT License. Free to use, modify, and distribute with attribution.

See LICENSE for details.

Attribution

Developed by: Members of the PDA Task Force

Maintained by: PDA Task Force — Advancing best practices in project data analytics and AI deployment.

If this library helps you, consider giving it a ⭐ on GitHub.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jan 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_task_planning-0.2.0.tar.gz (68.3 kB view details)

Uploaded Jan 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_task_planning-0.2.0-py3-none-any.whl (50.7 kB view details)

Uploaded Jan 3, 2026 Python 3

File details

Details for the file agent_task_planning-0.2.0.tar.gz.

File metadata

Download URL: agent_task_planning-0.2.0.tar.gz
Upload date: Jan 3, 2026
Size: 68.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.6

File hashes

Hashes for agent_task_planning-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`1fbb6add63db62464448767a58bd01ecaf4bed0b6d3dcaa3335b7ac95f91eadd`
MD5	`622756b46e76c7de736adf2eb4eca1a5`
BLAKE2b-256	`473f9f58c1a5d8fe4a110aa015e4c63a22c583f3b0ca9178025558df3f028763`

See more details on using hashes here.

File details

Details for the file agent_task_planning-0.2.0-py3-none-any.whl.

File metadata

Download URL: agent_task_planning-0.2.0-py3-none-any.whl
Upload date: Jan 3, 2026
Size: 50.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.6

File hashes

Hashes for agent_task_planning-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a453f3c4afec68b15056006f7d46bfd36134de3cc7df617bb8016c679e9d7fe1`
MD5	`a1c99e5f8a2d7008de384a18ebe9da10`
BLAKE2b-256	`cd6ccdad90b20cb2b648530e9d613d2828dad2360b61acaa67fd321d27dcc822`

See more details on using hashes here.

agent-task-planning 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Agent Task Planning

Why Planning Matters

Features

Quick Start

Installation

Basic Usage

With Guardrails

Using Local Models (Ollama)

Command Line Demo

Confidence Extraction (New)

Outlier Mining (New)

Architecture

Planning Patterns

The Fundamental Trade-off

Documentation

Core Features

PM Data Extraction

Prompt Templates

Examples

Task Planning

Confidence Extraction

Outlier Mining

Supporting Research

Coming Soon: ARMM

Contributing

Acknowledgements

License

Attribution

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes