Forward-Deployed Research Agent for AI system failure discovery

These details have not been verified by PyPI

Project links

Project description

Tinman

Agent Tinman

Forward-Deployed Research Agent for Continuous AI Reliability Discovery

Quick Start • How It Works • API • Documentation

Tinman is not a testing tool. It's an autonomous research agent that continuously explores your AI system's behavior to discover failure modes you haven't imagined yet.

While traditional approaches wait for failures to happen, Tinman proactively generates hypotheses about what could fail, designs experiments to test them, and proposes interventions—all with human oversight at critical decision points.

Why Tinman?

The problem: Every team deploying LLMs faces the same question: "What don't we know about how this system can fail?"

Most tools help you monitor what you've already anticipated. Tinman helps you discover what you haven't.

Traditional Approach

- Reactive—triggered after incidents
- Tests known failure patterns
- Output: pass/fail results
- Goal: verify correctness
- Stops when tests pass

Tinman

+ Proactive—always exploring
+ Generates novel hypotheses
+ Output: understanding
+ Goal: expand knowledge
+ Never stops—research is ongoing

Core Capabilities

Capability	Description
Hypothesis-Driven Research	Generates testable hypotheses about potential failure modes based on system architecture and observed behavior
Controlled Experimentation	Tests each hypothesis with configurable parameters, cost controls, and reproducibility tracking
Failure Classification	Classifies failures using a structured taxonomy with severity ratings (S0-S4)
Intervention Design	Proposes concrete fixes: prompt mutations, guardrails, tool policy changes, architectural recommendations
Simulation & Validation	Validates interventions through counterfactual replay before deployment
Human-in-the-Loop	Risk-tiered approval gates ensure humans control consequential decisions

Quick Start

Installation

pip install AgentTinman

With specific model provider support:

pip install AgentTinman[openai]     # OpenAI
pip install AgentTinman[anthropic]  # Anthropic
pip install AgentTinman[all]        # All providers

Initialize & Run

# Initialize configuration
tinman init

# Launch the interactive TUI
tinman tui

# Or run a research cycle directly
tinman research --focus "tool use failures"

# Generate a report
tinman report --format markdown

Configure Your Model

Edit .tinman/config.yaml:

mode: lab

models:
  default: openai
  providers:
    openai:
      api_key: ${OPENAI_API_KEY}
      model: gpt-4-turbo-preview
    groq:
      api_key: ${GROQ_API_KEY}
      model: llama3-70b-8192

Set API keys as environment variables:

export OPENAI_API_KEY="..."
export ANTHROPIC_API_KEY="..."
export GROQ_API_KEY="..."

Supported Providers

Provider	Cost	Best For
Ollama	Free (local)	Privacy, offline, unlimited experimentation
Groq	Free tier	Speed, high volume
OpenRouter	Many free models	Variety—DeepSeek, Qwen, Llama, Mistral
Together	$25 free credits	Quality open models
OpenAI	Paid	GPT-4
Anthropic	Paid	Claude

The Research Cycle

┌─────────────────────────────────────────────────────────────────────┐
│                         RESEARCH CYCLE                              │
│                                                                     │
│   ┌────────────┐    ┌────────────┐    ┌────────────┐               │
│   │ Hypothesis │───▶│ Experiment │───▶│  Failure   │               │
│   │   Engine   │    │  Executor  │    │ Discovery  │               │
│   └────────────┘    └────────────┘    └─────┬──────┘               │
│         ▲                                   │                       │
│         │           ┌────────────┐    ┌─────▼──────┐               │
│         │           │ Simulation │◀───│Intervention│               │
│         │           │   Engine   │    │   Engine   │               │
│         │           └─────┬──────┘    └────────────┘               │
│         │                 │                                         │
│         └─────── Learning ◀┘                                        │
│                (Memory Graph)                                       │
└─────────────────────────────────────────────────────────────────────┘

Each cycle:

Generate hypotheses about potential failures
Design experiments to test each hypothesis
Execute experiments with approval gates
Discover and classify failures found
Design interventions to address failures
Simulate fixes via counterfactual replay
Learn from results for future cycles

Operating Modes

Tinman operates in three modes with different safety boundaries:

Mode	Purpose	Approval Gates	Destructive Tests
LAB	Unrestricted research	Auto-approve most	Allowed
SHADOW	Observe production traffic	Review S3+ severity	Read-only
PRODUCTION	Active protection	Human approval required	Blocked

Transition rules:

LAB → SHADOW → PRODUCTION (progressive rollout)
PRODUCTION → SHADOW (regression fallback)
Cannot skip modes (LAB → PRODUCTION blocked)

Failure Taxonomy

Tinman classifies failures into six primary classes:

Class	Description	Example
REASONING	Logical errors, goal drift, hallucination	Model contradicts itself mid-response
LONG_CONTEXT	Context window issues, attention dilution	Forgets instructions from early in conversation
TOOL_USE	Tool call failures, parameter errors, exfil	Calls API with invalid arguments
FEEDBACK_LOOP	Output amplification, error cascades	Retry loop amplifies initial mistake
DEPLOYMENT	Infrastructure, latency, resource issues	Timeout under load causes partial response
SECURITY	Credential access, data exfil, evasion, injection	Agent attempts to read SSH keys or bypasses filters

Security Failure Types:

Type	Description
`credential_access`	Attempted access to SSH keys, API tokens, wallets
`data_exfiltration`	Sending sensitive data to external locations
`unauthorized_action`	Taking actions without explicit user consent
`privilege_escalation`	Attempting sudo, sandbox bypass, elevation
`injection_susceptible`	Following injected instructions from untrusted content
`evasion_bypass`	Using encoding tricks to bypass security controls
`memory_poisoning`	Injecting malicious context into agent memory
`platform_specific_attack`	OS-specific attacks (mimikatz, LaunchAgents, systemd)

Severity levels:

Level	Impact	Action
S0	Benign	Monitor
S1	UX degradation	Review
S2	Business risk	Investigate
S3	Serious risk	Mitigate
S4	Critical	Immediate action

Human-in-the-Loop Approval

Risk-tiered approval balances autonomy with safety:

Action Request
      │
      ▼
┌─────────────┐
│    Risk     │
│  Evaluator  │
└──────┬──────┘
       │
   ┌───┴───┐
   │       │
   ▼       ▼
 SAFE    REVIEW ───▶ Human Decision ───▶ Approved/Rejected
   │       │
   │       ▼
   │     BLOCK ───▶ Rejected (always)
   ▼
Auto-Approved

Risk Tiers:

SAFE: Low-risk actions proceed automatically
REVIEW: Medium-risk actions require human approval
BLOCK: High-risk actions are always rejected

Python API

Basic Usage

import asyncio
from tinman import create_tinman
from tinman.config.modes import Mode

async def main():
    tinman = await create_tinman(
        mode=Mode.LAB,
        db_url="sqlite:///tinman.db"
    )

    results = await tinman.research_cycle(
        focus="reasoning failures in multi-step tasks",
        max_hypotheses=5,
        max_experiments=3
    )

    print(f"Hypotheses tested: {len(results.hypotheses)}")
    print(f"Failures discovered: {len(results.failures)}")
    print(f"Interventions proposed: {len(results.interventions)}")

    report = await tinman.generate_report(format="markdown")
    print(report)

    await tinman.close()

asyncio.run(main())

Pipeline Integration

from tinman.integrations import PipelineAdapter
from tinman.config.modes import Mode

adapter = PipelineAdapter(mode=Mode.SHADOW)

async def monitored_llm_call(messages):
    ctx = adapter.create_context(messages=messages)
    ctx = await adapter.pre_request(ctx)

    response = await your_existing_llm_call(messages)

    ctx.response = response
    ctx = await adapter.post_request(ctx)
    return response

Real-Time Gateway Monitoring

Connect to AI gateway WebSockets for instant event analysis:

from tinman.integrations.gateway_plugin import GatewayMonitor, ConsoleAlerter, FileAlerter

monitor = GatewayMonitor(your_adapter)
monitor.add_alerter(ConsoleAlerter())
monitor.add_alerter(FileAlerter("~/tinman-findings.md"))
await monitor.start()

Platform adapters:

OpenClaw — Security eval harness + gateway adapter

Configuration Reference

# .tinman/config.yaml

mode: lab  # lab, shadow, or production

database:
  url: sqlite:///tinman.db
  pool_size: 10

models:
  default: openai
  providers:
    openai:
      api_key: ${OPENAI_API_KEY}
      model: gpt-4-turbo-preview
      temperature: 0.7

research:
  max_hypotheses_per_run: 10
  max_experiments_per_hypothesis: 3
  default_runs_per_experiment: 5

experiments:
  max_parallel: 5
  default_timeout_seconds: 300
  cost_limit_usd: 10.0

risk:
  auto_approve_safe: true
  block_on_destructive: true

approval:
  mode: interactive  # interactive, async, auto_approve, auto_reject
  timeout_seconds: 300

Architecture

tinman/
├── agents/                 # Autonomous research agents
│   ├── hypothesis_engine.py
│   ├── experiment_architect.py
│   ├── experiment_executor.py
│   ├── failure_discovery.py
│   ├── intervention_engine.py
│   └── simulation_engine.py
├── config/                 # Configuration and modes
│   ├── modes.py            # LAB/SHADOW/PRODUCTION
│   └── settings.py
├── core/                   # Infrastructure
│   ├── approval_gate.py
│   ├── control_plane.py
│   ├── risk_policy.py
│   ├── cost_tracker.py
│   └── tools.py
├── db/                     # Persistence
│   ├── connection.py
│   ├── models.py
│   └── audit.py
├── integrations/           # External connections
│   ├── model_client.py
│   ├── pipeline_adapter.py
│   ├── gateway_plugin/     # Real-time monitoring
│   └── *_client.py         # Provider clients
├── memory/                 # Knowledge graph
│   ├── graph.py
│   ├── models.py
│   └── repository.py
├── taxonomy/               # Failure classification
│   ├── failure_types.py
│   └── classifiers.py
├── reporting/              # Report generation
└── tui/                    # Terminal UI

Documentation

Document	Description
QUICKSTART.md	Get running in 5 minutes
CONCEPTS.md	Core mental model and abstractions
ARCHITECTURE.md	System design and data flow
TAXONOMY.md	Complete failure classification guide
MODES.md	Operating mode behavior matrix
HITL.md	Human-in-the-loop approval system
INTEGRATION.md	Pipeline integration patterns
CONFIGURATION.md	All configuration options

Live docs: oliveskin.github.io/Agent-Tinman

Use Cases

Research Lab — Run in LAB mode against development deployments to discover failures before production.

Shadow Monitoring — Deploy in SHADOW mode alongside production to observe real traffic and surface emergent failure modes.

Production Protection — Run in PRODUCTION mode with human approval gates to actively protect against discovered patterns.

Compliance & Audit — Use the memory graph to demonstrate due diligence: what was discovered, what was applied, what the outcomes were.

Real-Time Gateway Monitoring — Connect to AI gateway WebSockets for instant event analysis as failures happen.

Philosophy

Tinman embodies a research methodology, not just a tool:

Systematic curiosity — Continuously ask "what could go wrong?" rather than "does this work?"
Hypothesis-driven — Every test has a reason. No random fuzzing.
Human oversight — Autonomy where safe, human judgment where it matters.
Temporal knowledge — Not just "what failed" but "what did we know, when?"
Continuous learning — Each cycle informs the next. Knowledge compounds.

Contributing

We welcome contributions. See CONTRIBUTING.md for:

Development setup
Code style (ruff, mypy)
Testing requirements
PR process

License

Apache 2.0 — See LICENSE

Tinman is a public good.
Not monetized, not proprietary—just a crystallized methodology for systematic AI reliability research.

GitHub • PyPI • Docs

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.1

Feb 1, 2026

0.2.0

Feb 1, 2026

0.1.61

Jan 31, 2026

0.1.60

Jan 31, 2026

0.1.59

Jan 31, 2026

0.1.58

Jan 30, 2026

0.1.57

Jan 30, 2026

0.1.56

Jan 29, 2026

0.1.54

Jan 29, 2026

0.1.53

Jan 29, 2026

0.1.52

Jan 29, 2026

0.1.51

Jan 29, 2026

0.1.50

Jan 29, 2026

0.1.49

Jan 29, 2026

0.1.48

Jan 29, 2026

0.1.47

Jan 29, 2026

0.1.46

Jan 29, 2026

0.1.45

Jan 29, 2026

0.1.44

Jan 29, 2026

0.1.43

Jan 29, 2026

0.1.42

Jan 29, 2026

0.1.41

Jan 29, 2026

0.1.40

Jan 29, 2026

0.1.39

Jan 29, 2026

0.1.38

Jan 29, 2026

0.1.37

Jan 29, 2026

0.1.36

Jan 29, 2026

0.1.35

Jan 29, 2026

0.1.34

Jan 29, 2026

0.1.33

Jan 29, 2026

0.1.32

Jan 29, 2026

0.1.31

Jan 29, 2026

0.1.30

Jan 29, 2026

0.1.29

Jan 29, 2026

0.1.28

Jan 29, 2026

0.1.27

Jan 29, 2026

0.1.26

Jan 29, 2026

0.1.25

Jan 29, 2026

0.1.24

Jan 29, 2026

0.1.23

Jan 28, 2026

0.1.22

Jan 28, 2026

0.1.21

Jan 28, 2026

0.1.20

Jan 28, 2026

0.1.19

Jan 28, 2026

0.1.18

Jan 28, 2026

0.1.17

Jan 28, 2026

0.1.16

Jan 28, 2026

0.1.15

Jan 28, 2026

0.1.14

Jan 28, 2026

0.1.13

Jan 28, 2026

0.1.11

Jan 28, 2026

0.1.10

Jan 28, 2026

0.1.9

Jan 28, 2026

0.1.8

Jan 28, 2026

0.1.7

Jan 28, 2026

0.1.6

Jan 28, 2026

0.1.5

Jan 28, 2026

0.1.4

Jan 28, 2026

0.1.3

Jan 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agenttinman-0.2.1.tar.gz (667.1 kB view details)

Uploaded Feb 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agenttinman-0.2.1-py3-none-any.whl (260.3 kB view details)

Uploaded Feb 1, 2026 Python 3

File details

Details for the file agenttinman-0.2.1.tar.gz.

File metadata

Download URL: agenttinman-0.2.1.tar.gz
Upload date: Feb 1, 2026
Size: 667.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agenttinman-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`00108e783909af51164f23074f0e956b9f247d391c6ff226ed4641c8ffc6ac99`
MD5	`1005fadba7afcd26665d8ebb44957071`
BLAKE2b-256	`1214d0492256839f2b7ac97474f961c2a7051a3edd5202c776e965fcabb28c72`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agenttinman-0.2.1.tar.gz:

Publisher: publish.yml on oliveskin/Agent-Tinman

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agenttinman-0.2.1.tar.gz
- Subject digest: 00108e783909af51164f23074f0e956b9f247d391c6ff226ed4641c8ffc6ac99
- Sigstore transparency entry: 887632363
- Sigstore integration time: Feb 1, 2026
Source repository:
- Permalink: oliveskin/Agent-Tinman@82a01617963d96411b0ad029537f2143766f940c
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/oliveskin
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@82a01617963d96411b0ad029537f2143766f940c
- Trigger Event: push

File details

Details for the file agenttinman-0.2.1-py3-none-any.whl.

File metadata

Download URL: agenttinman-0.2.1-py3-none-any.whl
Upload date: Feb 1, 2026
Size: 260.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agenttinman-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9b65078fbda519a65de5aea42a9790619b0c7a7b59886ab76a7fa539b11c36b5`
MD5	`42b5ef25c1fb67272e7f1db775384316`
BLAKE2b-256	`010f649092314f637bd8838275413b132c5860cae0b5255b6f249709c77ed37b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agenttinman-0.2.1-py3-none-any.whl:

Publisher: publish.yml on oliveskin/Agent-Tinman

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agenttinman-0.2.1-py3-none-any.whl
- Subject digest: 9b65078fbda519a65de5aea42a9790619b0c7a7b59886ab76a7fa539b11c36b5
- Sigstore transparency entry: 887632447
- Sigstore integration time: Feb 1, 2026
Source repository:
- Permalink: oliveskin/Agent-Tinman@82a01617963d96411b0ad029537f2143766f940c
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/oliveskin
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@82a01617963d96411b0ad029537f2143766f940c
- Trigger Event: push

AgentTinman 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Agent Tinman

Why Tinman?

Traditional Approach

Tinman

Core Capabilities

Quick Start

Installation

Initialize & Run

Configure Your Model

Supported Providers

The Research Cycle

Operating Modes

Failure Taxonomy

Human-in-the-Loop Approval

Python API

Basic Usage

Pipeline Integration

Real-Time Gateway Monitoring

Configuration Reference

Architecture

Documentation

Use Cases

Philosophy

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance