Skip to main content

The identification contract engine for the LLM era.

Project description

██████╗  █████╗  ██████╗  ██████╗ ███████╗██████╗
██╔══██╗██╔══██╗██╔════╝ ██╔════╝ ██╔════╝██╔══██╗
██║  ██║███████║██║  ███╗██║  ███╗█████╗  ██████╔╝
██║  ██║██╔══██║██║   ██║██║   ██║██╔══╝  ██╔══██╗
██████╔╝██║  ██║╚██████╔╝╚██████╔╝███████╗██║  ██║
╚═════╝ ╚═╝  ╚═╝ ╚═════╝  ╚═════╝ ╚══════╝╚═╝  ╚═╝

The identification contract engine for the LLM era.

Your AI agent doesn't test for parallel trends. DAGger does.

PyPI version Python 3.12+ Ruff Apache 2.0

Quickstart · Why DAGger? · Architecture · MCP Server · References


The Problem

Modern AI tooling has made econometric execution trivially easy and causal validity invisibly catastrophic.

Ask an AI agent to run a DiD analysis. It will produce a beautifully formatted coefficient table with stars, clustered standard errors, and a significant p-value. What it will never do: test whether parallel trends hold, check for anticipation effects, or verify that your instrument has a strong first stage.

The output looks like science. It is causal fraud.

This isn't a model capability failure — it's an architectural one. There is no software primitive that makes "I must validate my identification strategy before I can estimate" a programmable constraint rather than a vague checklist item in a methods section.

DAGger is that primitive.


The Solution

import dagr as dg

# 1. Declare your identification strategy — before touching data
contract = dg.DiffInDiffContract(
    estimand=dg.Estimand.ATT,
    outcome_var="log_employment",
    treatment_var="min_wage_increase",
    time_var="year",
    unit_var="county_fips",
    assumptions=frozenset([
        dg.Assumption.PARALLEL_TRENDS,
        dg.Assumption.NO_ANTICIPATION,
    ]),
    pre_periods=(-4, -3, -2, -1),
    post_periods=(0, 1, 2, 3),
)

# 2. Run the preflight battery — or don't estimate
with dg.AuditLedger(contract=contract, experiment_id="min_wage_2024",
                    ledger_path="artifacts/ledger.jsonld") as ledger:
    preflight = contract.validate(data, verbose=True)
    ledger.attach_preflight(preflight)
    preflight.assert_valid()          # raises IdentificationError if INVALID

    # 3. Estimate — @requires_contract is satisfied by the AuditLedger context
    results = contract.build_estimator(data).fit()
    ledger.attach_results(results)

# 4. Quantify robustness
report = ledger.generate_report()
print(report.sensitivity.rr_breakdown_m_bar)   # breakdown M* for Rambachan-Roth
print(report.sensitivity.oster_delta)          # Oster delta for selection on unobservables

The preflight renders this in your terminal:

+----------------------------------+-------------------+-----------+-----------+
| Assumption                       | Status            | Statistic |  p-value  |
+----------------------------------+-------------------+-----------+-----------+
| parallel_trends                  |  VALID            |  F=0.421  |  p=0.657  |
| no_anticipation                  |  VALID            |  F=0.183  |  p=0.831  |
| no_differential_attrition        |  VALID            |    -      |    -      |
+----------------------------------+-------------------+-----------+-----------+

VERDICT: VALID | Pass rate: 100% | Contract: eg:3f4a8c2b1d...

Why DAGger?

DAG (Directed Acyclic Graph) is the mathematical foundation of causal inference — Pearl's do-calculus, structural causal models, identification theory. -ger is the agent suffix: logger, debugger, linter. DAGger is the tool that brings DAG-rigour to production pipelines.

The mypy of causal inference.

Three principles:

1. Contracts before estimation. An IdentificationContract is a Pydantic v2 model that declares your entire causal strategy — estimand, assumptions, sensitivity analyses — in a typed, serializable, content-addressed document. You cannot call .fit() without one.

2. Tests, not checklists. Every declared assumption is backed by a statistically correct, peer-reviewed test. Parallel trends uses the Rambachan-Roth pre-trend F-test. First stage uses the Olea-Pflueger (2013) effective F, not the Staiger-Stock rule of thumb. The tests are the contract.

3. Machine-readable by default. Every result is a Pydantic model with semantic field names and paired interpretation strings. The audit ledger is SHA-256 content-addressed JSON-LD. The MCP server exposes everything to LLM agents natively. Provenance is not an afterthought — it's the architecture.


Quickstart

Install

pip install dagr-py
# With estimators (pyfixest, linearmodels, doubleml):
pip install "dagr-py[estimators]"
# With R bridge (HonestDiD, rdrobust, synthdid):
pip install "dagr-py[r-bridge]"

DiD in 5 steps

import dagr as dg
import polars as pl

# Step 1: Load your panel data
data = pl.read_parquet("county_employment_panel.parquet")

# Step 2: Declare the identification contract (pre-registration)
contract = dg.DiffInDiffContract(
    estimand=dg.Estimand.ATT,
    outcome_var="log_employment",
    treatment_var="min_wage_increase",
    time_var="year",
    unit_var="county_fips",
    assumptions=frozenset([
        dg.Assumption.PARALLEL_TRENDS,
        dg.Assumption.NO_ANTICIPATION,
    ]),
    pre_periods=(-4, -3, -2, -1),
    post_periods=(0, 1, 2, 3),
)
contract.to_file("artifacts/contract.json")   # pre-registration artifact

# Step 3: Validate assumptions — the preflight battery
with dg.AuditLedger(contract=contract, experiment_id="min_wage_2024",
                    ledger_path="artifacts/ledger.jsonld") as ledger:
    preflight = contract.validate(data)        # runs all declared tests
    ledger.attach_preflight(preflight)
    preflight.assert_valid()                   # hard gate: stops here if INVALID

    # Step 4: Estimate
    results = contract.build_estimator(data).fit()
    ledger.attach_results(results)

    # Step 5: Sensitivity analysis
    rr = dg.RambachanRothSensitivity(results=results)
    rr_report = rr.compute(pre_period_max_abs=0.03)
    ledger.attach_sensitivity(rr_report)

# Step 6: The machine-readable report
report = ledger.generate_report()
print(report.model_dump_json(indent=2))        # LLM-consumable, SHA-256 signed

Instrumental Variables

contract = dg.IVContract(
    estimand=dg.Estimand.LATE,
    outcome_var="earnings",
    treatment_var="years_education",
    time_var="birth_cohort",
    unit_var="individual_id",
    assumptions=frozenset([
        dg.Assumption.INSTRUMENT_RELEVANCE,    # tested: Olea-Pflueger effective F
        dg.Assumption.INSTRUMENT_EXCLUSION,    # tested: reduced-form plausibility
    ]),
    instruments=("compulsory_schooling_law",),
    endogenous_vars=("years_education",),
    estimator_preference="2SLS",
    pre_periods=(-3, -2, -1),
    post_periods=(0, 1, 2),
)

Architecture

+-----------------------------------------------------------------------+
|                            DAGR STACK                                 |
+-----------------------------------------------------------------------+
|  Human Researcher (Python API)  |  AI Agent (MCP Tool Call)          |
|               |                 |           |                         |
|               v                             v                         |
|  +--------------------------------------------------+                |
|  |              dagr.contracts                      |                |
|  |  DiffInDiffContract  |  IVContract               |                |
|  |  Pydantic v2, frozen, content-addressed          |                |
|  +---------------------+----------------------------+                |
|                        | .validate(data)                             |
|                        v                                             |
|  +--------------------------------------------------+                |
|  |              dagr.validators                     |                |
|  |  TWFE event-study  |  Olea-Pflueger F            |                |
|  |  Rambachan-Roth    |  Sargan-Hansen J            |                |
|  +---------------------+----------------------------+                |
|                        | ValidationSuiteResult                       |
|                        | VALID / VALID_CONDITIONAL                   |
|                        | FRAGILE / INVALID                           |
|                        v                                             |
|  +--------------------------------------------------+                |
|  |    contract.build_estimator(data).fit()          |                |
|  |  TWFE (pyfixest)  |  2SLS / LIML / GMM-IV        |                |
|  |  Callaway-Sant'Anna  |  AIPW (doubly-robust)     |                |
|  +---------------------+----------------------------+                |
|                        | EconGuardResults                            |
|                        v                                             |
|  +--------------------------------------------------+                |
|  |             dagr.sensitivity                     |                |
|  |  Rambachan-Roth (2023)  |  Oster delta (2019)    |                |
|  |  Spec Curve             |  Rosenbaum bounds      |                |
|  +---------------------+----------------------------+                |
|                        v                                             |
|  +--------------------------------------------------+                |
|  |              dagr.ledger                         |                |
|  |  AuditLedger — SHA-256 content-addressed         |                |
|  |  IdentificationReport — LLM-optimised JSON       |                |
|  |  MCP Server — 4 tools, Claude/GPT-4 native       |                |
|  +--------------------------------------------------+                |
+-----------------------------------------------------------------------+

MCP Server

DAGger exposes its full validation and sensitivity stack as an MCP server that any LLM agent can call natively.

dagr serve --port 8080

Four tools:

Tool Description
run_identification_preflight Validate contract assumptions against data. Returns ValidationSuiteResult.
compute_sensitivity Rambachan-Roth bounds or Oster delta. Returns SensitivityReport.
generate_identification_report Full audit report from a ledger file.
validate_did_contract Flat-parameter convenience tool for LLM agents.

The IdentificationReport is designed for LLM consumption: semantic field names, paired interpretation strings, controlled-vocabulary verdicts.

{
  "schema_version": "dagr/v1",
  "overall_verdict": "valid",
  "identification": {
    "strategy": "difference_in_differences",
    "estimand": "average_treatment_effect_on_treated",
    "status": "valid",
    "recommendation": "Identification is valid. Proceed with estimation.",
    "failed_assumptions": []
  },
  "sensitivity": {
    "rr_breakdown_m_bar": 1.43,
    "rr_verdict": "valid",
    "oster_delta": 2.14,
    "oster_verdict": "valid"
  },
  "audit_hash": "sha256:3f4a8c2b1d..."
}

Feature Matrix

Feature DAGger Naive LLM Manual Checklist
Assumption validation Automated None Manual
Fails on violation Hard gate Silent Sometimes
Parallel trends Event-study F + max|beta| - Visual
Weak instruments Olea-Pflueger (2013) - Rule-of-thumb
Rambachan-Roth bounds Python + R bridge - -
Oster delta Analytic (3dp verified) - -
Audit trail SHA-256 JSON-LD - Notes
LLM-readable output Pydantic + MCP Unstructured -
Pre-registration OSF JSON-LD - Manual

What DAGger Catches

The demo notebook notebooks/01_the_llm_got_it_wrong.ipynb walks through a real case:

  1. An AI agent produces a "significant" employment effect of a minimum wage increase
  2. DAGger runs the preflight — the pre-trend F-test fails
  3. Rambachan-Roth bounds show the CI crosses zero at M-bar = 0.38
  4. The corrected analysis on properly identified data: VALID verdict
[AI result]    ATT = -0.047***   SE = 0.018   p = 0.009   <- looks correct
[DAGger]       Pre-trend F(3,...) = 8.41   p = 0.002      <- assumption violated
               Rambachan-Roth: breakdown M* = 0.38        <- not robust
               Verdict: INVALID. Do not use this estimate.

Quality

Metric Value
Test suite 330+ tests
Type checking mypy --strict (zero errors)
Linting ruff (zero violations)
Coverage >= 80%
Build uv build + twine check PASSED
License Apache 2.0
Python 3.12+

Installation Options

# Core (contracts, validators, sensitivity, ledger, MCP, CLI)
pip install dagr-py

# With real estimators (pyfixest, linearmodels, doubleml)
pip install "dagr-py[estimators]"

# With R bridge (HonestDiD LP bounds, rdrobust, synthdid)
pip install "dagr-py[r-bridge]"

# Full installation
pip install "dagr-py[estimators,r-bridge]"

References

DAGger implements or wraps published statistical methods. All implementations cite their source paper and include analytic test cases with known expected values.

Callaway, Brantly and Pedro H.C. Sant'Anna. 2021. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, 225(2), 200-230.

Olea, Jose Luis Montiel and Carolin Pflueger. 2013. "A Robust Test for Weak Instruments." Journal of Business & Economic Statistics, 31(3), 358-369.

Oster, Emily. 2019. "Unobservable Selection and Coefficient Stability: Theory and Evidence." Journal of Business & Economic Statistics, 37(2), 187-204.

Rambachan, Ashesh and Jonathan Roth. 2023. "A More Credible Approach to Parallel Trends." The Review of Economic Studies, 90(5), 2555-2591.

Rosenbaum, Paul R. 2002. Observational Studies (2nd ed.). Springer. Chapter 4.

Simonsohn, Uri, Joseph P. Simmons, and Leif D. Nelson. 2020. "Specification Curve Analysis." Nature Human Behaviour, 4, 1208-1214.


Contributing

See CONTRIBUTING.md. We especially welcome:

  • New validators with cited source papers and analytic test cases
  • RDDContract implementation (good first issue)
  • Callaway-Sant'Anna doubly-robust with cross-fitting
  • Spanish/Portuguese translations of interpretation strings

Critical rule: Any modification to a statistical validator must cite the source paper and include an analytic test with a known expected value. Statistical correctness is not negotiable.


DAGger — Because causal validity should be a compiler error, not a footnote.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dagr_py-0.1.0.tar.gz (302.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dagr_py-0.1.0-py3-none-any.whl (77.7 kB view details)

Uploaded Python 3

File details

Details for the file dagr_py-0.1.0.tar.gz.

File metadata

  • Download URL: dagr_py-0.1.0.tar.gz
  • Upload date:
  • Size: 302.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dagr_py-0.1.0.tar.gz
Algorithm Hash digest
SHA256 dc43fdab90be4a0b4f8ac62aaf57bd02a0bb368b25f8e0a4eb73083951bfbdfb
MD5 dd56d5611137733d5aaee5ff44bce343
BLAKE2b-256 73e561702d46b1b2067844ac6ebc1f9450c1647bd99cb161f2ed7736859ad115

See more details on using hashes here.

File details

Details for the file dagr_py-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dagr_py-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 77.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dagr_py-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a550002c5991ea8d55344c37bcf7064826453f25d90169b13ed15cbe529b1c73
MD5 886151f2d0f128aec1e746ebb0436a40
BLAKE2b-256 6da60da203ac43607cf67bc72bfd44fc65c6c450b63182960fad77d5e7719fef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page