DoubleML build, estimation, plotting, and utility pipelines.
Project description
DML Pipeline
This repo is a small framework for running DoubleML on administrative-style
program data. It separates project-specific choices from reusable pipeline code:
you edit project_configuration/, then run the pipeline in dml_code/.
The repo is currently filled with a synthetic example so you can run the whole flow before replacing it with real project data.
Mental Model
The workflow has two main steps:
- Build an analysis dataset. Start from a databank and program file,
join them, construct event-time variables, and write processed panels to
data/build_output/. - Estimate DML effects. Read a YAML experiment, resolve its program,
covariates, filters, and models from the registries, then write logs to
outputs/raw/.
After estimation, scripts can turn the raw logs into plots and tables.
project_configuration/ + data/build/
|
v
dml_code.pipeline.step1_build
|
v
data/build_output/
|
v
dml_code.pipeline.step2_estimate
|
v
outputs/raw/ -> outputs/plots/ and outputs/tables/
Run The Example
python project_scripts/generate_example.py
python -m dml_code.pipeline.step1_build example_program
python -m dml_code.pipeline.step2_estimate synthetic_example
python project_scripts/plot_example.py
The first command creates synthetic input data in data/build/. Step 1 writes
processed panels to data/build_output/. Step 2 writes estimation and
prediction logs to outputs/raw/. The plotting script writes diagnostics to
outputs/plots/ and outputs/tables/.
What You Edit
Most project setup happens in project_configuration/.
project_configuration/build_spec.py: define the databank files, columns to carry through, relative-time columns to generate, and any generated features created after panel construction.project_configuration/registries/programs.py: define each program: its source file, treatment column, enrollment-year column, and program-specific columns.project_configuration/registries/covariate_sets.py: name reusable covariate lists and mark categorical covariates for dummy encoding.project_configuration/registries/filter_sets.py: name reusable Polars filters for estimation samples.project_configuration/registries/models.py: name outcome and propensity learners.project_configuration/estimation_experiments/*.yaml: choose combinations of programs, outcomes, covariates, filters, models, and control sampling rates to estimate.
The pipeline code in dml_code/ is meant to stay reusable.
dml_code/pipeline/: runnable steps,step1_build.pyandstep2_estimate.py.dml_code/src/: shared helpers for building, estimating, paths, outputs, and logging.
project_scripts/ is for ad hoc project work tied to particular runs:
generating example data, viewing outputs, making plots, running diagnostics,
and writing small experiment-specific analyses.
How To Add A Real Project
- Put source parquet files somewhere under
data/or pointproject_configuration/at their real locations. - Update
project_configuration/build_spec.pywith the databank files and feature-generation logic. - Add program definitions in
project_configuration/registries/programs.py. - Add covariate sets, filters, and models in the registry files.
- Create or copy a YAML file in
project_configuration/estimation_experiments/. - Run step 1 for a program, then step 2 for an experiment.
Example:
python -m dml_code.pipeline.step1_build my_program
python -m dml_code.pipeline.step2_estimate my_experiment
Use project_scripts/ for project-specific follow-up work: viewing outputs
from particular runs, making plots and tables, running diagnostics, robustness
checks, and other exploratory analyses.
Where Results Go
data/build/: input data used by the example.data/build_output/: processed analysis datasets created by step 1.outputs/raw/: machine-readable estimation, prediction, and diagnostic logs.outputs/plots/: generated figures.outputs/tables/: generated tables.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dml_dev-0.1.1.tar.gz.
File metadata
- Download URL: dml_dev-0.1.1.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b4845f84addbc1c9ba83ec8142587cba30751954170621a68bbc648e3db8bb0
|
|
| MD5 |
bd93bc05ddb9f148736c8ef8072aa44b
|
|
| BLAKE2b-256 |
2aac55df3fa452da50747ae0444643d5f2cd30403d0f7f58f5b5689e75992bcd
|
Provenance
The following attestation bundles were made for dml_dev-0.1.1.tar.gz:
Publisher:
publish.yml on coreygb1/dml-dev
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dml_dev-0.1.1.tar.gz -
Subject digest:
9b4845f84addbc1c9ba83ec8142587cba30751954170621a68bbc648e3db8bb0 - Sigstore transparency entry: 1462175693
- Sigstore integration time:
-
Permalink:
coreygb1/dml-dev@dd0ffdd486cded824f3e89053fd44ffd9b947756 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/coreygb1
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dd0ffdd486cded824f3e89053fd44ffd9b947756 -
Trigger Event:
push
-
Statement type:
File details
Details for the file dml_dev-0.1.1-py3-none-any.whl.
File metadata
- Download URL: dml_dev-0.1.1-py3-none-any.whl
- Upload date:
- Size: 24.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53834330b14c023112bad0e6392cb949344c67d9660f0cc8f51ccd693689afb2
|
|
| MD5 |
83a630f36be6074d80d4449d10ee64d7
|
|
| BLAKE2b-256 |
ba27f327e2625ddc7991c2e71856e20a0f89bb98fb8a3d8317e0f5908cfd9b87
|
Provenance
The following attestation bundles were made for dml_dev-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on coreygb1/dml-dev
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
dml_dev-0.1.1-py3-none-any.whl -
Subject digest:
53834330b14c023112bad0e6392cb949344c67d9660f0cc8f51ccd693689afb2 - Sigstore transparency entry: 1462175721
- Sigstore integration time:
-
Permalink:
coreygb1/dml-dev@dd0ffdd486cded824f3e89053fd44ffd9b947756 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/coreygb1
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dd0ffdd486cded824f3e89053fd44ffd9b947756 -
Trigger Event:
push
-
Statement type: