A harness for agentic data science — run coding agents with domain skills, parallel subagents, and frozen Docker environments
Project description
decision-lab
A harness for agentic data science.
Coding agents write good code. They make bad analytical decisions. decision-lab gives you the tools to fix the second part.
decision-lab runs your analysis multiple ways — different models, different assumptions — and checks whether they converge. If they converge on the same answer, you can trust it. If they don't, it tells you what it doesn't know and what experiments would resolve the uncertainty. You package the prompts, domain skills, and environment into a decision-pack, point it at your data, and get back reports, figures, and recommendations that hold up to scrutiny.
Why
There are many ways to analyze a dataset. Most of them are wrong. An unsupervised agent picks one path through the analytical space and commits to it. If that path happens to be wrong, you get a nice-looking report with bad conclusions. Nobody notices for months.
We tested this on marketing mix modeling. We gave vanilla Claude Code and our MMM agent the same adversarial dataset where no valid inference was possible. Claude Code fit a model and recommended budget reallocations. Our agent tried 11 approaches, found that none of the models converged, said so, and recommended experiments to collect better data.
decision-lab (dlab) is the framework we built to make agents behave like that.
How it works
You package everything an agent needs into a decision-pack: agent prompts, domain skills, tools, and a locked environment. The agent explores multiple approaches instead of committing to the first one that runs, and consolidates the results into a report.
Skills constrain the agent to methodologically sound paths — mandatory diagnostics, preferred model structures, sensible defaults. Browse and install validated data science skills from Decision Hub.
Parallel subagents fan out with different approaches to the same problem (different data prep, different model structures). If results converge across approaches, you have evidence the conclusions are robust. If they diverge, the agent flags the disagreement and identifies what drives it. Supports running compute-heavy tasks on Modal.
Locked environments. Reproducibility matters. Library APIs change constantly, LLMs are trained on old versions, and skills are tuned to specific packages. decision-packs lock dependencies so the agent codes against the right API every time. By default, sessions run in a Docker container with pinned dependencies. Don't want Docker? The agent will set up the environment locally before running.
Domain expertise is loaded through decision-packs — pluggable configurations that specialize the agent for a specific analytical domain. The first decision-pack targets Bayesian marketing mix modeling. Finance, forecasting, and other domains can be added by writing a new pack.
Install
Requires Docker and Python 3.10+
pip install dlab-cli
Quick start
echo "ANTHROPIC_API_KEY=your-key-here" >> .env
# Run the MMM decision-pack on the included example dataset
dlab --dpack decision-packs/mmm \
--data decision-packs/mmm/example-data/example_dataset.csv \
--env-file .env \
--work-dir ./mmm-run \
--prompt "Analyze our marketing spend and recommend budget allocation"
# Watch it work
dlab connect ./mmm-run
Or build your own decision-pack. Ask Claude to scaffold one for you:
dhub install pymc-labs/decision-lab
claude
# > "Create a decision-pack for time series forecasting with statsforecast"
What's a decision-pack?
A directory with everything an agent needs: system prompts, domain skills, tools, a locked environment, and permissions.
my-dpack/
config.yaml # Name, model, hooks
docker/
Dockerfile # Locked environment
requirements.txt # Pinned dependencies
opencode/
opencode.json # Permissions
agents/
orchestrator.md # Main agent system prompt
tools/ # Custom tools
skills/ # Domain knowledge
parallel_agents/ # Fan-out configs
See the poem decision-pack for a fully annotated example showing how all the pieces connect. Here's what happens when you run it:
dlab --dpack decision-packs/poem --env-file .env --prompt "Write me a poem about the ocean"
- dlab builds the Docker image from
docker/Dockerfile(cached after first run) - The pre-run hook
say_hi.shruns inside the container - The orchestrator (
literary-agent.md) starts and calls the terrible poet (popo-poet.md) via thetasktool - The orchestrator reads the terrible poet's attempt, decides it's bad, and spawns 3 parallel poet instances (
poet.md) with different styles via theparallel-agentstool - Each instance writes
summary.md. A consolidator (auto-generated frompoet.yaml) compares them - The orchestrator picks the best poem and writes
final_poem.md - The post-run hook
print_result.shprints it to the terminal
The session directory ends up with parallel instance outputs, logs, and the final poem — all browsable with dlab connect or dlab timeline.
Features
Run sessions
dlab --dpack PATH --data PATH --prompt TEXT --env-file .env
Builds the Docker image (cached between runs), starts the container, runs pre-run hooks, launches the agent, runs post-run hooks, fixes file ownership, and stops the container. Without --work-dir, sessions are auto-numbered by dpack name (dlab-mmm-workdir-001, dlab-mmm-workdir-002, ...) and can be resumed with --continue-dir.
Live monitoring
dlab connect ./mmm-run
A Textual TUI that shows live log events, agent status, cost tracking, and artifacts as the session runs. Browse between the orchestrator, parallel instances, and consolidator. Works with both running and completed sessions.
https://github.com/user-attachments/assets/24976838-3427-4cab-9351-2fc0b28e8f29
Execution timeline
dlab timeline ./mmm-run
Displays a Gantt chart of the session with timing, cost breakdown per agent, and idle periods. Shows the orchestrator, all parallel instances, and consolidators on a single timeline.
Creation wizards
dlab create-dpack # Interactive wizard to scaffold a new decision-pack
dlab create-parallel-agent # Wizard to add parallel agent configs to an existing decision-pack
The decision-pack wizard walks through 8 screens: name, container setup (package manager + base image), features (Decision Hub, Modal, Python library), model selection, permissions, directory skeletons, skill search, and review. Supports conda, pip, uv, and pixi.
https://github.com/user-attachments/assets/58c566f6-1d98-4825-aa7a-47dfd93bb2dc
Install as shortcut
dlab install ./my-dpack
# Now run directly:
my-dpack --data ./data --prompt "..."
Creates a wrapper script in ~/.local/bin/ so you can run a decision-pack by name instead of passing --dpack every time.
Decision Hub integration
decision-packs work with Decision Hub (hub.decision.ai), a registry of validated skills for data science and AI. Agents can search and install skills from the hub at runtime, giving them access to domain knowledge they weren't originally packaged with.
# Install the Decision Hub CLI as a skill in your decision-pack
dhub install pymc-labs/dhub-cli --agent opencode
The hub has 2,200+ skills from 38 organizations with automated evals that verify skills actually improve agent performance.
Environment variable forwarding
All environment variables starting with DLAB_ are automatically forwarded from the host to the Docker container. decision-packs use these for runtime configuration:
# MMM decision-pack: fit models locally instead of on Modal
DLAB_FIT_MODEL_LOCALLY=1 dlab --dpack mmm --data ./data --prompt "..."
CLI reference
dlab --dpack PATH --data PATH --prompt TEXT # Run a session
dlab connect WORK_DIR # Live TUI monitor
dlab timeline [WORK_DIR] # Execution Gantt chart
dlab create-dpack [OUTPUT_DIR] # Interactive wizard
dlab create-parallel-agent [DPACK_DIR] # Parallel agent wizard
dlab install DPACK_PATH # Create shortcut command
Docs
| Guide | What it covers |
|---|---|
| CLI Reference | All commands, flags, env var forwarding |
| decision-packs | Config format, hooks, permissions, Modal integration |
| Parallel Agents | Fan-out architecture, YAML config, consolidator |
| Docker | Image building, container lifecycle, volume mounts |
| Sessions | Work directories, state management, resuming runs |
| Log Processing | NDJSON log format, event types, TUI/timeline parsing |
| Installation | Setup, prerequisites, development install |
Built by PyMC Labs
dlab is developed by PyMC Labs, the team behind PyMC and pymc-marketing.
License
Apache 2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dlab_cli-0.2.0.tar.gz.
File metadata
- Download URL: dlab_cli-0.2.0.tar.gz
- Upload date:
- Size: 164.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb1486c97ec693e52141ecc2fb2c6305c47c31fe36f34351e74769e0e5765046
|
|
| MD5 |
2b1a2fee081409f704eca698b924f042
|
|
| BLAKE2b-256 |
0c8263b6666811c14a30bc68a0b1911f111032d3ce5f6ea4bd121cdd15ed2108
|
Provenance
The following attestation bundles were made for dlab_cli-0.2.0.tar.gz:
Publisher:
workflow.yml on pymc-labs/decision-lab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dlab_cli-0.2.0.tar.gz -
Subject digest:
cb1486c97ec693e52141ecc2fb2c6305c47c31fe36f34351e74769e0e5765046 - Sigstore transparency entry: 1258044986
- Sigstore integration time:
-
Permalink:
pymc-labs/decision-lab@1d2fd2f3e76f320eb3ccb7b09ee4a64249e19d0d -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/pymc-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@1d2fd2f3e76f320eb3ccb7b09ee4a64249e19d0d -
Trigger Event:
release
-
Statement type:
File details
Details for the file dlab_cli-0.2.0-py3-none-any.whl.
File metadata
- Download URL: dlab_cli-0.2.0-py3-none-any.whl
- Upload date:
- Size: 135.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bcbd88975c637b20a731cab90f0b01b4ce8e480faf1f4cf34d52886e68eac95d
|
|
| MD5 |
040e462ca8713ab08a842582971b9eb3
|
|
| BLAKE2b-256 |
37be27e6f9710b61341187987c2693fe4ce790cbc25c8bf71cb80ab5fcae4be7
|
Provenance
The following attestation bundles were made for dlab_cli-0.2.0-py3-none-any.whl:
Publisher:
workflow.yml on pymc-labs/decision-lab
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dlab_cli-0.2.0-py3-none-any.whl -
Subject digest:
bcbd88975c637b20a731cab90f0b01b4ce8e480faf1f4cf34d52886e68eac95d - Sigstore transparency entry: 1258045051
- Sigstore integration time:
-
Permalink:
pymc-labs/decision-lab@1d2fd2f3e76f320eb3ccb7b09ee4a64249e19d0d -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/pymc-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@1d2fd2f3e76f320eb3ccb7b09ee4a64249e19d0d -
Trigger Event:
release
-
Statement type: