Skip to main content

DoubleML build, estimation, plotting, and utility pipelines.

Project description

DML Pipeline

This repo is a small framework for running DoubleML on administrative-style program data. It separates project-specific choices from reusable pipeline code: you edit project_configuration/, then run the pipeline in dml_code/.

The repo is currently filled with a synthetic example so you can run the whole flow before replacing it with real project data.

Mental Model

The workflow has two main steps:

  1. Build an analysis dataset. Start from a databank and program file, join them, construct event-time variables, and write processed panels to data/build_output/.
  2. Estimate DML effects. Read a YAML experiment, resolve its program, covariates, filters, and models from the registries, then write logs to outputs/raw/.

After estimation, scripts can turn the raw logs into plots and tables.

project_configuration/ + data/build/
        |
        v
dml_code.pipeline.step1_build
        |
        v
data/build_output/
        |
        v
dml_code.pipeline.step2_estimate
        |
        v
outputs/raw/ -> outputs/plots/ and outputs/tables/

Run The Example

python project_scripts/generate_example.py
python -m dml_code.pipeline.step1_build example_program
python -m dml_code.pipeline.step2_estimate synthetic_example
python project_scripts/plot_example.py

The first command creates synthetic input data in data/build/. Step 1 writes processed panels to data/build_output/. Step 2 writes estimation and prediction logs to outputs/raw/. The plotting script writes diagnostics to outputs/plots/ and outputs/tables/.

What You Edit

Most project setup happens in project_configuration/.

  • project_configuration/build_spec.py: define the databank files, columns to carry through, relative-time columns to generate, and any generated features created after panel construction.
  • project_configuration/registries/programs.py: define each program: its source file, treatment column, enrollment-year column, and program-specific columns.
  • project_configuration/registries/covariate_sets.py: name reusable covariate lists and mark categorical covariates for dummy encoding.
  • project_configuration/registries/filter_sets.py: name reusable Polars filters for estimation samples.
  • project_configuration/registries/models.py: name outcome and propensity learners.
  • project_configuration/estimation_experiments/*.yaml: choose combinations of programs, outcomes, covariates, filters, models, and control sampling rates to estimate.

The pipeline code in dml_code/ is meant to stay reusable.

  • dml_code/pipeline/: runnable steps, step1_build.py and step2_estimate.py.
  • dml_code/src/: shared helpers for building, estimating, paths, outputs, and logging.

project_scripts/ is for ad hoc project work tied to particular runs: generating example data, viewing outputs, making plots, running diagnostics, and writing small experiment-specific analyses.

How To Add A Real Project

  1. Put source parquet files somewhere under data/ or point project_configuration/ at their real locations.
  2. Update project_configuration/build_spec.py with the databank files and feature-generation logic.
  3. Add program definitions in project_configuration/registries/programs.py.
  4. Add covariate sets, filters, and models in the registry files.
  5. Create or copy a YAML file in project_configuration/estimation_experiments/.
  6. Run step 1 for a program, then step 2 for an experiment.

Example:

python -m dml_code.pipeline.step1_build my_program
python -m dml_code.pipeline.step2_estimate my_experiment

Use project_scripts/ for project-specific follow-up work: viewing outputs from particular runs, making plots and tables, running diagnostics, robustness checks, and other exploratory analyses.

Where Results Go

  • data/build/: input data used by the example.
  • data/build_output/: processed analysis datasets created by step 1.
  • outputs/raw/: machine-readable estimation, prediction, and diagnostic logs.
  • outputs/plots/: generated figures.
  • outputs/tables/: generated tables.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dml_dev-0.1.1.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dml_dev-0.1.1-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file dml_dev-0.1.1.tar.gz.

File metadata

  • Download URL: dml_dev-0.1.1.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dml_dev-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9b4845f84addbc1c9ba83ec8142587cba30751954170621a68bbc648e3db8bb0
MD5 bd93bc05ddb9f148736c8ef8072aa44b
BLAKE2b-256 2aac55df3fa452da50747ae0444643d5f2cd30403d0f7f58f5b5689e75992bcd

See more details on using hashes here.

Provenance

The following attestation bundles were made for dml_dev-0.1.1.tar.gz:

Publisher: publish.yml on coreygb1/dml-dev

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dml_dev-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: dml_dev-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 24.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dml_dev-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 53834330b14c023112bad0e6392cb949344c67d9660f0cc8f51ccd693689afb2
MD5 83a630f36be6074d80d4449d10ee64d7
BLAKE2b-256 ba27f327e2625ddc7991c2e71856e20a0f89bb98fb8a3d8317e0f5908cfd9b87

See more details on using hashes here.

Provenance

The following attestation bundles were made for dml_dev-0.1.1-py3-none-any.whl:

Publisher: publish.yml on coreygb1/dml-dev

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page