Data agent with Python-native tools (no bash)

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Project description

data-harness

(data + ReAct — a controlled data-agent SDK for Python workflows)

A data-native agent SDK for Python — built around controlled execution, handle-based state, provider adapters, sessions, subagents, and reconstructable runs.

Most agent frameworks hand the model a shell and call it a day. data-harness takes a different approach: the model operates through a constrained Python interpreter, with data stored in a session cache and exposed as named handles. No bash. Explicit state. Logs that can reconstruct what happened.

data-harness began as an installable reference implementation for harness design. It is now developing into the full SDK/framework track. A separate learn-data-harness repository will be created after the SDK stabilises to extract the basic principles without async, production sandboxing, or SDK-heavy features.

The design is covered in a three-part series:

Why no bash?

Giving an agent shell access is the path of least resistance, but it creates real problems in production: unpredictable side effects, security exposure, and behaviour that's hard to reproduce. data-harness deliberately constrains the model to Python only — which turns out to be enough for most data workloads and forces cleaner tool design.

Core design decisions

Each decision here is intentional. Understanding them is the point.

Handle/snapshot pattern Large objects (DataFrames, arrays, query results) live in a SessionCache, not in message history. The model only sees a compact snapshot — shape, columns, a few sample rows. It accesses the data by writing Python against the handle name. This keeps context lean without hiding data from the model.

Prefix-stable system prompt The system prompt never changes between turns. Reminders, state, and nags are appended to the conversation suffix. This is a KV-cache discipline: a stable prefix means the provider can cache it, which reduces latency and cost on long runs.

Progressive connector disclosure Data connectors (databases, APIs, warehouses) are registered but hidden from the tool list until explicitly loaded. A shorter tool list means the model makes better routing decisions. Connectors are only visible when relevant.

Subagent isolation Spawned subagents get a fresh adapter and a fresh cache. State is transferred explicitly via input_handles. No implicit shared state. This makes subagent behaviour reproducible and debuggable.

Suffix-only nag reminders The planner escalates reminders at 4 / 8 / 12 turns without progress. These are always appended to the suffix, never inserted into the prefix, so the KV cache is never busted by reminder text.

JSONL turn logging Every turn is logged to a .jsonl file from the start. Not bolted on later. Each line is a complete turn record including latency, token counts, and cache hit/miss. Reproducibility is a first-class concern.

Install

# requires Python 3.10+ and uv
uv sync

Quick start

Agent needs a provider adapter. The adapter is the boundary between the provider SDK and the harness: it turns Anthropic/OpenAI responses into data-harness's normalised Message, ToolUseBlock, and token-count types. It is explicit on purpose so the harness is not tied to one model provider, and tests can swap in FakeAdapter without touching the loop.

For Anthropic:

from data_harness import Agent
from data_harness.providers.anthropic import AnthropicAdapter

adapter = AnthropicAdapter(model="claude-sonnet-4-6")
agent = Agent(adapter=adapter, system="You are a data analyst.")

result = agent.run("Compute the mean of [1, 2, 3, 4, 5] and print it.")
print(result)

For OpenAI, install the optional extra and change only the adapter:

pip install "data-harness[openai]"

from data_harness.providers.openai import OpenAIAdapter

adapter = OpenAIAdapter(model="gpt-4o-mini")

Run the minimal Anthropic example:

uv run python examples/quickstart.py

examples/quickstart.py requires ANTHROPIC_API_KEY when run as a script. Tests import build_agent() and drive it with FakeAdapter, so the example stays covered without token spend.

Chat sessions

Agent.run() is still the simple one-shot path: it starts a fresh message history each time. For chatbot or workbench applications, create a session and ask follow-up questions on it:

from data_harness import Agent
from data_harness.providers.openai import OpenAIAdapter

adapter = OpenAIAdapter(model="gpt-4o-mini")
agent = Agent(adapter=adapter, system="You are a data analyst.")

session = agent.session()
session.put("uploaded_data", df)

print(session.ask("What columns are in the uploaded data?"))
print(session.ask("Which numeric column has the highest average?"))

The session keeps one Harness, one message history, and one SessionCache. This is the path to use when a UI needs uploaded artefacts and conversation follow-up to stay in scope.

Connector example

Connector helpers keep the quick path small while preserving progressive disclosure. Connector tools start hidden; the model must call load_connectors before it can use them.

from data_harness import Agent
from data_harness.providers.anthropic import AnthropicAdapter

adapter = AnthropicAdapter(model="claude-sonnet-4-6")
agent = Agent(adapter=adapter, system="You are a data analyst.")

market_data = agent.connector(
    "market_data",
    description="Market data tools.",
)


def fetch_ohlcv(symbol: str) -> list[dict]:
    return [{"symbol": symbol, "close": 101.2}]


market_data.tool(
    fetch_ohlcv,
    description="Fetch OHLCV data for a ticker.",
)

result = agent.run("Load market_data and inspect AAPL.")
print(result)

What `Agent` composes

Agent is a thin composition layer over the lower-level primitives:

A provider adapter translates model-provider SDK objects into the harness's normalised response types.
Harness owns the ReAct loop, messages, dispatch, reminders, and JSONL logging.
SessionCache stores large values as handles plus compact snapshots.
AgentSession keeps a chat-style harness and cache alive across follow-up questions.
python_interpreter is the controlled execution surface; there is no bash tool.
list_variables exposes cache handles without dumping raw payloads.
ConnectorRegistry keeps connector tools hidden until loaded.
Planner reminders and subagents are opt-in helpers, not a second runtime.

For explicit wiring, read examples/advanced_wiring.py. The future learn-data-harness repository will provide the smaller, linear teaching guide once this SDK surface has stabilised.

Run the advanced example - it loads a checked-in FRED unemployment-rate sample, runs analysis, uses subagents and the planner (requires ANTHROPIC_API_KEY):

uv run python examples/advanced_wiring.py

Run tests:

uv run pytest tests/ -v
uv run pytest tests/smoke_tests.py -m live -v  # requires OPENAI_API_KEY

Project structure

data_harness/
  loop.py          # Harness: the core ReAct loop
  cache.py         # SessionCache: handle/snapshot storage
  providers/       # Normalised adapter interface (Anthropic and OpenAI)
  tools/
    interpreter.py # Sandboxed Python executor
    connectors.py  # Progressive connector registry
    planner.py     # Plan/nag tool
    subagent.py    # Isolated subagent spawning
    variables.py   # list_variables tool
  types.py         # Shared types: Message, ToolSpec, ContentBlock
  logger.py        # JSONL turn logging
  observe.py       # Latency measurement
examples/
  quickstart.py        # Minimal Agent path
  advanced_wiring.py   # Explicit Harness wiring
  data/                # Small public sample data for the advanced demo

Sandbox disclaimer

The Python interpreter uses AST checks and restricted globals to reduce accidental misuse. It is not a container sandbox and should not be treated as safe for untrusted input.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

maxkhor

Release history Release notifications | RSS feed

0.4.0

May 14, 2026

0.3.0

May 14, 2026

0.2.0

May 13, 2026

This version

0.1.3

May 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_harness-0.1.3.tar.gz (134.6 kB view details)

Uploaded May 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

data_harness-0.1.3-py3-none-any.whl (36.3 kB view details)

Uploaded May 13, 2026 Python 3

File details

Details for the file data_harness-0.1.3.tar.gz.

File metadata

Download URL: data_harness-0.1.3.tar.gz
Upload date: May 13, 2026
Size: 134.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for data_harness-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`5eb0190a5d47ff416684348668daf07706dae887fd8ffe937ab543c6d4e06b67`
MD5	`510f21440e03802f0770653cb7f5c414`
BLAKE2b-256	`00d8fb66d4a11e67344128a3105e9e17492902f39774dfae12c74d9a63442381`

See more details on using hashes here.

File details

Details for the file data_harness-0.1.3-py3-none-any.whl.

File metadata

Download URL: data_harness-0.1.3-py3-none-any.whl
Upload date: May 13, 2026
Size: 36.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for data_harness-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a23d8978829b409e32992502e3ec95ff1eeb085436a1edb494ae282f444674fc`
MD5	`86a72de57cb6ad51314cc8053a7d505c`
BLAKE2b-256	`c6519e98183f189082551c80a8a5e10e175c2b7f31a1409af8fab8504bceb8b1`

See more details on using hashes here.

data-harness 0.1.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

data-harness

Why no bash?

Core design decisions

Install

Quick start

Chat sessions

Connector example

What `Agent` composes

Project structure

Sandbox disclaimer

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

data-harness 0.1.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

data-harness

Why no bash?

Core design decisions

Install

Quick start

Chat sessions

Connector example

What Agent composes

Project structure

Sandbox disclaimer

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

What `Agent` composes