dagr-py

The identification contract engine for the LLM era.

These details have not been verified by PyPI

Project description

██████╗  █████╗  ██████╗  ██████╗ ███████╗██████╗
██╔══██╗██╔══██╗██╔════╝ ██╔════╝ ██╔════╝██╔══██╗
██║  ██║███████║██║  ███╗██║  ███╗█████╗  ██████╔╝
██║  ██║██╔══██║██║   ██║██║   ██║██╔══╝  ██╔══██╗
██████╔╝██║  ██║╚██████╔╝╚██████╔╝███████╗██║  ██║
╚═════╝ ╚═╝  ╚═╝ ╚═════╝  ╚═════╝ ╚══════╝╚═╝  ╚═╝

The identification contract engine for the LLM era.

Your AI agent doesn't test for parallel trends. DAGger does.

Quickstart · Why DAGger? · Architecture · MCP Server · References

The Problem

Modern AI tooling has made econometric execution trivially easy and causal validity invisibly catastrophic.

Ask an AI agent to run a DiD analysis. It will produce a beautifully formatted coefficient table with stars, clustered standard errors, and a significant p-value. What it will never do: test whether parallel trends hold, check for anticipation effects, or verify that your instrument has a strong first stage.

The output looks like science. It is causal fraud.

This isn't a model capability failure — it's an architectural one. There is no software primitive that makes "I must validate my identification strategy before I can estimate" a programmable constraint rather than a vague checklist item in a methods section.

DAGger is that primitive.

The Solution

import dagr as dg

# 1. Declare your identification strategy — before touching data
contract = dg.DiffInDiffContract(
    estimand=dg.Estimand.ATT,
    outcome_var="log_employment",
    treatment_var="min_wage_increase",
    time_var="year",
    unit_var="county_fips",
    assumptions=frozenset([
        dg.Assumption.PARALLEL_TRENDS,
        dg.Assumption.NO_ANTICIPATION,
    ]),
    pre_periods=(-4, -3, -2, -1),
    post_periods=(0, 1, 2, 3),
)

# 2. Run the preflight battery — or don't estimate
with dg.AuditLedger(contract=contract, experiment_id="min_wage_2024",
                    ledger_path="artifacts/ledger.jsonld") as ledger:
    preflight = contract.validate(data, verbose=True)
    ledger.attach_preflight(preflight)
    preflight.assert_valid()          # raises IdentificationError if INVALID

    # 3. Estimate — @requires_contract is satisfied by the AuditLedger context
    results = contract.build_estimator(data).fit()
    ledger.attach_results(results)

# 4. Quantify robustness
report = ledger.generate_report()
print(report.sensitivity.rr_breakdown_m_bar)   # breakdown M* for Rambachan-Roth
print(report.sensitivity.oster_delta)          # Oster delta for selection on unobservables

The preflight renders this in your terminal:

+----------------------------------+-------------------+-----------+-----------+
| Assumption                       | Status            | Statistic |  p-value  |
+----------------------------------+-------------------+-----------+-----------+
| parallel_trends                  |  VALID            |  F=0.421  |  p=0.657  |
| no_anticipation                  |  VALID            |  F=0.183  |  p=0.831  |
| no_differential_attrition        |  VALID            |    -      |    -      |
+----------------------------------+-------------------+-----------+-----------+

VERDICT: VALID | Pass rate: 100% | Contract: eg:3f4a8c2b1d...

Why DAGger?

DAG (Directed Acyclic Graph) is the mathematical foundation of causal inference — Pearl's do-calculus, structural causal models, identification theory. -ger is the agent suffix: logger, debugger, linter. DAGger is the tool that brings DAG-rigour to production pipelines.

The mypy of causal inference.

Three principles:

1. Contracts before estimation. An IdentificationContract is a Pydantic v2 model that declares your entire causal strategy — estimand, assumptions, sensitivity analyses — in a typed, serializable, content-addressed document. You cannot call .fit() without one.

2. Tests, not checklists. Every declared assumption is backed by a statistically correct, peer-reviewed test. Parallel trends uses the Rambachan-Roth pre-trend F-test. First stage uses the Olea-Pflueger (2013) effective F, not the Staiger-Stock rule of thumb. The tests are the contract.

3. Machine-readable by default. Every result is a Pydantic model with semantic field names and paired interpretation strings. The audit ledger is SHA-256 content-addressed JSON-LD. The MCP server exposes everything to LLM agents natively. Provenance is not an afterthought — it's the architecture.

Quickstart

Install

pip install dagr-py
# With estimators (pyfixest, linearmodels, doubleml):
pip install "dagr-py[estimators]"
# With R bridge (HonestDiD, rdrobust, synthdid):
pip install "dagr-py[r-bridge]"

DiD in 5 steps

import dagr as dg
import polars as pl

# Step 1: Load your panel data
data = pl.read_parquet("county_employment_panel.parquet")

# Step 2: Declare the identification contract (pre-registration)
contract = dg.DiffInDiffContract(
    estimand=dg.Estimand.ATT,
    outcome_var="log_employment",
    treatment_var="min_wage_increase",
    time_var="year",
    unit_var="county_fips",
    assumptions=frozenset([
        dg.Assumption.PARALLEL_TRENDS,
        dg.Assumption.NO_ANTICIPATION,
    ]),
    pre_periods=(-4, -3, -2, -1),
    post_periods=(0, 1, 2, 3),
)
contract.to_file("artifacts/contract.json")   # pre-registration artifact

# Step 3: Validate assumptions — the preflight battery
with dg.AuditLedger(contract=contract, experiment_id="min_wage_2024",
                    ledger_path="artifacts/ledger.jsonld") as ledger:
    preflight = contract.validate(data)        # runs all declared tests
    ledger.attach_preflight(preflight)
    preflight.assert_valid()                   # hard gate: stops here if INVALID

    # Step 4: Estimate
    results = contract.build_estimator(data).fit()
    ledger.attach_results(results)

    # Step 5: Sensitivity analysis
    rr = dg.RambachanRothSensitivity(results=results)
    rr_report = rr.compute(pre_period_max_abs=0.03)
    ledger.attach_sensitivity(rr_report)

# Step 6: The machine-readable report
report = ledger.generate_report()
print(report.model_dump_json(indent=2))        # LLM-consumable, SHA-256 signed

Instrumental Variables

contract = dg.IVContract(
    estimand=dg.Estimand.LATE,
    outcome_var="earnings",
    treatment_var="years_education",
    time_var="birth_cohort",
    unit_var="individual_id",
    assumptions=frozenset([
        dg.Assumption.INSTRUMENT_RELEVANCE,    # tested: Olea-Pflueger effective F
        dg.Assumption.INSTRUMENT_EXCLUSION,    # tested: reduced-form plausibility
    ]),
    instruments=("compulsory_schooling_law",),
    endogenous_vars=("years_education",),
    estimator_preference="2SLS",
    pre_periods=(-3, -2, -1),
    post_periods=(0, 1, 2),
)

Architecture

+-----------------------------------------------------------------------+
|                            DAGR STACK                                 |
+-----------------------------------------------------------------------+
|  Human Researcher (Python API)  |  AI Agent (MCP Tool Call)          |
|               |                 |           |                         |
|               v                             v                         |
|  +--------------------------------------------------+                |
|  |              dagr.contracts                      |                |
|  |  DiffInDiffContract  |  IVContract               |                |
|  |  Pydantic v2, frozen, content-addressed          |                |
|  +---------------------+----------------------------+                |
|                        | .validate(data)                             |
|                        v                                             |
|  +--------------------------------------------------+                |
|  |              dagr.validators                     |                |
|  |  TWFE event-study  |  Olea-Pflueger F            |                |
|  |  Rambachan-Roth    |  Sargan-Hansen J            |                |
|  +---------------------+----------------------------+                |
|                        | ValidationSuiteResult                       |
|                        | VALID / VALID_CONDITIONAL                   |
|                        | FRAGILE / INVALID                           |
|                        v                                             |
|  +--------------------------------------------------+                |
|  |    contract.build_estimator(data).fit()          |                |
|  |  TWFE (pyfixest)  |  2SLS / LIML / GMM-IV        |                |
|  |  Callaway-Sant'Anna  |  AIPW (doubly-robust)     |                |
|  +---------------------+----------------------------+                |
|                        | EconGuardResults                            |
|                        v                                             |
|  +--------------------------------------------------+                |
|  |             dagr.sensitivity                     |                |
|  |  Rambachan-Roth (2023)  |  Oster delta (2019)    |                |
|  |  Spec Curve             |  Rosenbaum bounds      |                |
|  +---------------------+----------------------------+                |
|                        v                                             |
|  +--------------------------------------------------+                |
|  |              dagr.ledger                         |                |
|  |  AuditLedger — SHA-256 content-addressed         |                |
|  |  IdentificationReport — LLM-optimised JSON       |                |
|  |  MCP Server — 4 tools, Claude/GPT-4 native       |                |
|  +--------------------------------------------------+                |
+-----------------------------------------------------------------------+

MCP Server

DAGger exposes its full validation and sensitivity stack as an MCP server that any LLM agent can call natively.

dagr serve --port 8080

Four tools:

Tool	Description
`run_identification_preflight`	Validate contract assumptions against data. Returns `ValidationSuiteResult`.
`compute_sensitivity`	Rambachan-Roth bounds or Oster delta. Returns `SensitivityReport`.
`generate_identification_report`	Full audit report from a ledger file.
`validate_did_contract`	Flat-parameter convenience tool for LLM agents.

The IdentificationReport is designed for LLM consumption: semantic field names, paired interpretation strings, controlled-vocabulary verdicts.

{
  "schema_version": "dagr/v1",
  "overall_verdict": "valid",
  "identification": {
    "strategy": "difference_in_differences",
    "estimand": "average_treatment_effect_on_treated",
    "status": "valid",
    "recommendation": "Identification is valid. Proceed with estimation.",
    "failed_assumptions": []
  },
  "sensitivity": {
    "rr_breakdown_m_bar": 1.43,
    "rr_verdict": "valid",
    "oster_delta": 2.14,
    "oster_verdict": "valid"
  },
  "audit_hash": "sha256:3f4a8c2b1d..."
}

Feature Matrix

Feature	DAGger	Naive LLM	Manual Checklist
Assumption validation	Automated	None	Manual
Fails on violation	Hard gate	Silent	Sometimes
Parallel trends	Event-study F + max\|beta\|	-	Visual
Weak instruments	Olea-Pflueger (2013)	-	Rule-of-thumb
Rambachan-Roth bounds	Python + R bridge	-	-
Oster delta	Analytic (3dp verified)	-	-
Audit trail	SHA-256 JSON-LD	-	Notes
LLM-readable output	Pydantic + MCP	Unstructured	-
Pre-registration	OSF JSON-LD	-	Manual

What DAGger Catches

The demo notebook notebooks/01_the_llm_got_it_wrong.ipynb walks through a real case:

An AI agent produces a "significant" employment effect of a minimum wage increase
DAGger runs the preflight — the pre-trend F-test fails
Rambachan-Roth bounds show the CI crosses zero at M-bar = 0.38
The corrected analysis on properly identified data: VALID verdict

[AI result]    ATT = -0.047***   SE = 0.018   p = 0.009   <- looks correct
[DAGger]       Pre-trend F(3,...) = 8.41   p = 0.002      <- assumption violated
               Rambachan-Roth: breakdown M* = 0.38        <- not robust
               Verdict: INVALID. Do not use this estimate.

Quality

Metric	Value
Test suite	330+ tests
Type checking	`mypy --strict` (zero errors)
Linting	`ruff` (zero violations)
Coverage	>= 80%
Build	`uv build` + `twine check PASSED`
License	Apache 2.0
Python	3.12+

Installation Options

# Core (contracts, validators, sensitivity, ledger, MCP, CLI)
pip install dagr-py

# With real estimators (pyfixest, linearmodels, doubleml)
pip install "dagr-py[estimators]"

# With R bridge (HonestDiD LP bounds, rdrobust, synthdid)
pip install "dagr-py[r-bridge]"

# Full installation
pip install "dagr-py[estimators,r-bridge]"

References

DAGger implements or wraps published statistical methods. All implementations cite their source paper and include analytic test cases with known expected values.

Callaway, Brantly and Pedro H.C. Sant'Anna. 2021. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, 225(2), 200-230.

Olea, Jose Luis Montiel and Carolin Pflueger. 2013. "A Robust Test for Weak Instruments." Journal of Business & Economic Statistics, 31(3), 358-369.

Oster, Emily. 2019. "Unobservable Selection and Coefficient Stability: Theory and Evidence." Journal of Business & Economic Statistics, 37(2), 187-204.

Rambachan, Ashesh and Jonathan Roth. 2023. "A More Credible Approach to Parallel Trends." The Review of Economic Studies, 90(5), 2555-2591.

Rosenbaum, Paul R. 2002. Observational Studies (2nd ed.). Springer. Chapter 4.

Simonsohn, Uri, Joseph P. Simmons, and Leif D. Nelson. 2020. "Specification Curve Analysis." Nature Human Behaviour, 4, 1208-1214.

Contributing

See CONTRIBUTING.md. We especially welcome:

New validators with cited source papers and analytic test cases
RDDContract implementation (good first issue)
Callaway-Sant'Anna doubly-robust with cross-fitting
Spanish/Portuguese translations of interpretation strings

Critical rule: Any modification to a statistical validator must cite the source paper and include an analytic test with a known expected value. Statistical correctness is not negotiable.

DAGger — Because causal validity should be a compiler error, not a footnote.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dagr_py-0.1.0.tar.gz (302.7 kB view details)

Uploaded Jun 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dagr_py-0.1.0-py3-none-any.whl (77.7 kB view details)

Uploaded Jun 2, 2026 Python 3

File details

Details for the file dagr_py-0.1.0.tar.gz.

File metadata

Download URL: dagr_py-0.1.0.tar.gz
Upload date: Jun 2, 2026
Size: 302.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dagr_py-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`dc43fdab90be4a0b4f8ac62aaf57bd02a0bb368b25f8e0a4eb73083951bfbdfb`
MD5	`dd56d5611137733d5aaee5ff44bce343`
BLAKE2b-256	`73e561702d46b1b2067844ac6ebc1f9450c1647bd99cb161f2ed7736859ad115`

See more details on using hashes here.

File details

Details for the file dagr_py-0.1.0-py3-none-any.whl.

File metadata

Download URL: dagr_py-0.1.0-py3-none-any.whl
Upload date: Jun 2, 2026
Size: 77.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dagr_py-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a550002c5991ea8d55344c37bcf7064826453f25d90169b13ed15cbe529b1c73`
MD5	`886151f2d0f128aec1e746ebb0436a40`
BLAKE2b-256	`6da60da203ac43607cf67bc72bfd44fc65c6c450b63182960fad77d5e7719fef`

See more details on using hashes here.

dagr-py 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

The Problem

The Solution

Why DAGger?

Quickstart

Install

DiD in 5 steps

Instrumental Variables

Architecture

MCP Server

Feature Matrix

What DAGger Catches

Quality

Installation Options

References

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes