Skip to main content

Add your description here

Project description

Inga 因果

Coverage

inga is a toolkit for generating and inspecting synthetic tabular datasets. It constructs arbitrarily complex Structural Causal Models (SCMs), draws samples from them, and computes causal effects and causal biases conditioned on observed variables and outcomes. All computed quantities are stored and made available for causally-informed pre-training of tabular models.

Flow X to Y

Causal Effect and Causal Bias

The current scope of this repository is restricted to SCMs with continuous variables. Let $V_i$ denote a generic scalar variable in the SCM, and let $U_{V_i} \sim 𝒩(0,1)$ be its corresponding exogenous noise, such that

$$ V_i := f_{V_i}(\mathrm{Pa}(V_i), U_{V_i}) := \bar{f}{V_i}(\mathrm{Pa}(V_i)) + \sigma{V_i} U_{V_i}. $$

Here, $\mathrm{Pa}(V_i)$ denotes the set of parents of $V_i$ in the DAG, $\bar f_{V_i}$ captures the deterministic structural component, and $\sigma_{V_i}$ controls the scale of the exogenous noise.

In particular, let $X$ denote a treatment variable, $Y$ an outcome, and $𝒪$ a set of observed variables. Under mild regularity assumptions (Detommaso et al.), the causal effect and causal bias for a given treatment value $x$ and observation vector $o$ are defined as

\begin{aligned}
𝒞_X(x, o) 
&:= 𝔼\big[\nabla_x f_Y^x \,\big|\, x, o\big], \\
ℬ_X(x, o) 
&:= -\sum_{V_i \in \{X\}\cup 𝒪}
\frac{1}{\sigma_{V_i}} 𝔼\Big[
\Big(
\nabla_{V_i} f_Y^{x,o} - (f_Y^{x,o} - 𝔼[Y \mid x, o])\, U_{V_i}
\Big)
\nabla_x (f_{V_i}^{x,o} - v_i)
\,\Big|\, x, o
\Big].
\end{aligned}

Here, $f_{V_i}^{a}$ denotes the structural function $f_{V_i}$ under intervention $A=a$. All expectations are taken with respect to the posterior distribution $p(U \mid x, o)$, where $U$ is the vector of all exogenous noise variables.

inga approximates this posterior using a robust Laplace approximation, enabling scalable computation in high-dimensional settings and across batches of observations $(x, o)$.

One can show that the association between treatment $X$ and outcome $Y$ decomposes into causal effect and causal bias:

$$ 𝒜_X(x, o) := \nabla_x 𝔼[Y \mid x, o] = 𝒞_X(x, o) + ℬ_X(x, o). $$

Causally Consistent Pre-Training

Causal effect and causal bias provide a granular characterization of how information propagates from observed variables to the outcome within the DAG.

Standard point-estimation models aim to approximate the conditional expectation $𝔼[Y \mid x, o]$, but they do not distinguish between contributions arising from causal pathways and those arising from non-causal (e.g., confounding or purely statistical) dependencies. As a result, the underlying data-generating process is often unidentifiable, which can lead to suboptimal generalization and brittleness under distribution shift.

Consider an encoder model $z := h(o)$ and a prediction head $\hat{y}(z)$. Introduce two additional heads, $\hat{c}_j(z)$ and $\hat{b}_j(z)$, intended to learn the causal effect and causal bias from $O_j$ (treated as the treatment variable) to $Y$. We say that the model is causally consistent for $O_j$ if

$$ \begin{aligned} \nabla_{o_j} \hat{y} &= \hat{c}_j + \hat{b}_j, \ \hat{c}j &= 𝒞{O_j}(o_j, o), \ \hat{b}j &= ℬ{O_j}(o_j, o). \end{aligned} $$

inga enables causally consistent pre-training by generating synthetic datasets that include the full set of causal effects $𝒞_{O_j}(o_j, o)$ and causal biases $ℬ_{O_j}(o_j, o)$. These quantities can be incorporated directly into training objectives, encouraging models to learn representations that respect the causal structure of the data-generating process.

A Small Benchmark

The small benchmark causal_consistency_benchmark.py demonstrates this intution. A simple MLP encoder is attached to three linear heads, respectively predicting outcomes, causal effects and causal biases. The model is trained and tested individually on splits of 30 randomly generated synthetic dataset.

+--------------------+----------------+-------------------+-------------------------+
| method_type        | prediction_mae | causal_effect_mae | prediction_win_fraction |
+--------------------+----------------+-------------------+-------------------------+
| standard           | 0.7909 [0.31]  | 0.3353 [0.45]     | 0.0667                  |
| l2                 | 0.7868 [0.31]  | 0.3141 [0.46]     | 0.0667                  |
| causal_consistency | 0.7694 [0.31]  | 0.0461 [0.21]     | 0.8667                  |
+--------------------+----------------+-------------------+-------------------------+

The table shows that not only the model trained using causal consistency provides much more reliable causal effect estimates, but also decreases the generalization error on ~87% of the datasets. Results can be replicated by running uv run python examples/causal_consistency_benchmark.py.

How To:

Install

Clone the repository:

git clone https://github.com/gianlucadetommaso/inga.git
cd inga

Sync dependencies:

uv sync

Run scripts, for example:

uv run python -m examples.explore

Create a DAG

You can create and draw the DAG of a SCM as follows:

from inga.scm import SCM, Variable

scm = SCM(
    variables=[
        Variable(name="Z"),
        Variable(name="X", parent_names=["Z"]),
        Variable(name="Y", parent_names=["Z", "X"]),
    ]
)

scm.draw(output_path="YOUR_DAG.png")

Create a SCM

The class Variable defines a variable $V_i$ in the DAG, but leaves the mean function $\bar f_{V_i}$. To complete the SCM and compute causal quantities, you must create a child class that defines the mean function. For example:

import torch
from torch import Tensor
from inga.scm import Variable

class MyVariable(Variable):
    def f_mean(self, parents: dict[str, Tensor]) -> Tensor:
        f_mean: Tensor | float = 0.0

        for parent in parents.values():
            f_mean = f_mean + torch.sin(parent)
        
        return f_mean

An example of built-in Variable with defined mean function is LinearVariable. Now, Let's update the SCM using our newly defined variable class!

from inga.scm import SCM

scm = SCM(
    variables=[
        MyVariable(name="Z", sigma=1.0),
        MyVariable(name="X", sigma=1.0, parent_names=["Z"]),
        MyVariable(name="Y", sigma=1.0, parent_names=["Z", "X"]),
    ]
)

Compute causal effect and causal bias

We are ready to compute causal effect and causal bias. We need to define treatment variable, outcome variable and observed variables. Note: the treatment should always be observed, while the outcome should never be. Here an example:

from torch import Tensor

treatment_name, outcome_name = "X", "Y"
observed = {"X": Tensor([1.])}

scm.posterior.fit(observed)

causal_effect = scm.causal_effect(
    observed=observed, 
    treatment_name=treatment_name, 
    outcome_name=outcome_name
)
causal_bias = scm.causal_bias(
    observed=observed, 
    treatment_name=treatment_name, 
    outcome_name=outcome_name
)

Explore the dataset

You can investigate the dataset interactively by exporting the SCM to HTML:

scm.export_html(
    output_path="YOUR_SCM.html",
    observed_ranges={"X": (-2.0, 2.0)}
)

Run uv run python examples/explore.py to checkout an example of this!

Generate, save and load SCM datasets

Given that we have constructed our SCM, let's generate, save and load a SCM dataset.

from inga.scm import CausalQueryConfig, load_scm_dataset

dataset = scm.generate_dataset(
    num_samples=128,
    seed=123,
    queries=[
        CausalQueryConfig(
            treatment_name="X",
            outcome_name="Y",
            observed_names=["X"],
        ),
    ],
)

dataset_path = "YOUR_DATASET.json"
dataset.save(dataset_path)
loaded_dataset = load_scm_dataset(dataset_path)

Cite Inga

If you use inga in academic work, you can cite it with the following BibTeX entry (and optionally replace year and note with the exact release tag/commit and access date you used):

@software{detommaso_inga,
  author = {Detommaso, Gianluca},
  title = {Inga: Causal Synthetic Tabular Data Toolkit},
  url = {https://github.com/gianlucadetommaso/inga},
  year = {2026},
  note = {GitHub repository}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inga-0.1.0.tar.gz (54.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inga-0.1.0-py3-none-any.whl (47.8 kB view details)

Uploaded Python 3

File details

Details for the file inga-0.1.0.tar.gz.

File metadata

  • Download URL: inga-0.1.0.tar.gz
  • Upload date:
  • Size: 54.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for inga-0.1.0.tar.gz
Algorithm Hash digest
SHA256 05d660d4d259097b9fe511f714b6b69748a82a201589a4f4e3f6bbb130332385
MD5 ce01a34c095e6fd06d159547d26ca12f
BLAKE2b-256 09efdf1cfd90119cf8159d75e4117973f1238442030c6141015d50bc3bde7048

See more details on using hashes here.

File details

Details for the file inga-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: inga-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 47.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for inga-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 291afa06f49488b0746b98f7c3ce3cf24f0e822f23f0eb717c8c2f3da69ec6c2
MD5 a3436f06f7fc427289a169c7da10731e
BLAKE2b-256 00d0ade662261aa107f96f235eb628178fb2256d2fb8cde4645cd8fc5f800a3a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page