Add your description here

Project description

Inga 因果

Coverage

inga is a toolkit for generating and inspecting synthetic tabular datasets. It constructs arbitrarily complex Structural Causal Models (SCMs), draws samples from them, and computes causal effects and causal biases conditioned on observed variables and outcomes. All computed quantities are stored and made available for causally-informed pre-training of tabular models.

Flow X to Y

Causal Effect and Causal Bias

The current scope of this repository is restricted to SCMs with continuous variables. Let $V_i$ denote a generic scalar variable in the SCM, and let $U_{V_i} \sim 𝒩(0,1)$ be its corresponding exogenous noise, such that

$$ V_i := f_{V_i}(\mathrm{Pa}(V_i), U_{V_i}) := \bar{f}{V_i}(\mathrm{Pa}(V_i)) + \sigma{V_i} U_{V_i}. $$

Here, $\mathrm{Pa}(V_i)$ denotes the set of parents of $V_i$ in the DAG, $\bar f_{V_i}$ captures the deterministic structural component, and $\sigma_{V_i}$ controls the scale of the exogenous noise.

In particular, let $X$ denote a treatment variable, $Y$ an outcome, and $𝒪$ a set of observed variables. Under mild regularity assumptions (Detommaso et al.), the causal effect and causal bias for a given treatment value $x$ and observation vector $o$ are defined as

\begin{aligned}
𝒞_X(x, o) 
&:= 𝔼\big[\nabla_x f_Y^x \,\big|\, x, o\big], \\
ℬ_X(x, o) 
&:= -\sum_{V_i \in \{X\}\cup 𝒪}
\frac{1}{\sigma_{V_i}} 𝔼\Big[
\Big(
\nabla_{V_i} f_Y^{x,o} - (f_Y^{x,o} - 𝔼[Y \mid x, o])\, U_{V_i}
\Big)
\nabla_x (f_{V_i}^{x,o} - v_i)
\,\Big|\, x, o
\Big].
\end{aligned}

Here, $f_{V_i}^{a}$ denotes the structural function $f_{V_i}$ under intervention $A=a$. All expectations are taken with respect to the posterior distribution $p(U \mid x, o)$, where $U$ is the vector of all exogenous noise variables.

inga approximates this posterior using a robust Laplace approximation, enabling scalable computation in high-dimensional settings and across batches of observations $(x, o)$.

One can show that the association between treatment $X$ and outcome $Y$ decomposes into causal effect and causal bias:

$$ 𝒜_X(x, o) := \nabla_x 𝔼[Y \mid x, o] = 𝒞_X(x, o) + ℬ_X(x, o). $$

Causally Consistent Pre-Training

Causal effect and causal bias provide a granular characterization of how information propagates from observed variables to the outcome within the DAG.

Standard point-estimation models aim to approximate the conditional expectation $𝔼[Y \mid x, o]$, but they do not distinguish between contributions arising from causal pathways and those arising from non-causal (e.g., confounding or purely statistical) dependencies. As a result, the underlying data-generating process is often unidentifiable, which can lead to suboptimal generalization and brittleness under distribution shift.

Consider an encoder model $z := h(o)$ and a prediction head $\hat{y}(z)$. Introduce two additional heads, $\hat{c}_j(z)$ and $\hat{b}_j(z)$, intended to learn the causal effect and causal bias from $O_j$ (treated as the treatment variable) to $Y$. We say that the model is causally consistent for $O_j$ if

$$ \begin{aligned} \nabla_{o_j} \hat{y} &= \hat{c}_j + \hat{b}_j, \ \hat{c}j &= 𝒞{O_j}(o_j, o), \ \hat{b}j &= ℬ{O_j}(o_j, o). \end{aligned} $$

inga enables causally consistent pre-training by generating synthetic datasets that include the full set of causal effects $𝒞_{O_j}(o_j, o)$ and causal biases $ℬ_{O_j}(o_j, o)$. These quantities can be incorporated directly into training objectives, encouraging models to learn representations that respect the causal structure of the data-generating process.

A Small Benchmark

The small benchmark causal_consistency_benchmark.py demonstrates this intution. A simple MLP encoder is attached to three linear heads, respectively predicting outcomes, causal effects and causal biases. The model is trained and tested individually on splits of 30 randomly generated synthetic dataset.

+--------------------+----------------+-------------------+-------------------------+
| method_type        | prediction_mae | causal_effect_mae | prediction_win_fraction |
+--------------------+----------------+-------------------+-------------------------+
| standard           | 0.7909 [0.31]  | 0.3353 [0.45]     | 0.0667                  |
| l2                 | 0.7868 [0.31]  | 0.3141 [0.46]     | 0.0667                  |
| causal_consistency | 0.7694 [0.31]  | 0.0461 [0.21]     | 0.8667                  |
+--------------------+----------------+-------------------+-------------------------+

The table shows that not only the model trained using causal consistency provides much more reliable causal effect estimates, but also decreases the generalization error on ~87% of the datasets. Results can be replicated by running uv run python examples/causal_consistency_benchmark.py.

How To:

Install

Clone the repository:

git clone https://github.com/gianlucadetommaso/inga.git
cd inga

Sync dependencies:

uv sync

Run scripts, for example:

uv run python -m examples.explore

Create a DAG

You can create and draw the DAG of a SCM as follows:

from inga.scm import SCM, Variable

scm = SCM(
    variables=[
        Variable(name="Z"),
        Variable(name="X", parent_names=["Z"]),
        Variable(name="Y", parent_names=["Z", "X"]),
    ]
)

scm.draw(output_path="YOUR_DAG.png")

Create a SCM

The class Variable defines a variable $V_i$ in the DAG, but leaves the mean function $\bar f_{V_i}$. To complete the SCM and compute causal quantities, you must create a child class that defines the mean function. For example:

import torch
from torch import Tensor
from inga.scm import Variable

class MyVariable(Variable):
    def f_mean(self, parents: dict[str, Tensor]) -> Tensor:
        f_mean: Tensor | float = 0.0

        for parent in parents.values():
            f_mean = f_mean + torch.sin(parent)
        
        return f_mean

An example of built-in Variable with defined mean function is LinearVariable. Now, Let's update the SCM using our newly defined variable class!

from inga.scm import SCM

scm = SCM(
    variables=[
        MyVariable(name="Z", sigma=1.0),
        MyVariable(name="X", sigma=1.0, parent_names=["Z"]),
        MyVariable(name="Y", sigma=1.0, parent_names=["Z", "X"]),
    ]
)

Compute causal effect and causal bias

We are ready to compute causal effect and causal bias. We need to define treatment variable, outcome variable and observed variables. Note: the treatment should always be observed, while the outcome should never be. Here an example:

from torch import Tensor

treatment_name, outcome_name = "X", "Y"
observed = {"X": Tensor([1.])}

scm.posterior.fit(observed)

causal_effect = scm.causal_effect(
    observed=observed, 
    treatment_name=treatment_name, 
    outcome_name=outcome_name
)
causal_bias = scm.causal_bias(
    observed=observed, 
    treatment_name=treatment_name, 
    outcome_name=outcome_name
)

Explore the dataset

You can investigate the dataset interactively by exporting the SCM to HTML:

scm.export_html(
    output_path="YOUR_SCM.html",
    observed_ranges={"X": (-2.0, 2.0)}
)

Run uv run python examples/explore.py to checkout an example of this!

Generate, save and load SCM datasets

Given that we have constructed our SCM, let's generate, save and load a SCM dataset.

from inga.scm import CausalQueryConfig, load_scm_dataset

dataset = scm.generate_dataset(
    num_samples=128,
    seed=123,
    queries=[
        CausalQueryConfig(
            treatment_name="X",
            outcome_name="Y",
            observed_names=["X"],
        ),
    ],
)

dataset_path = "YOUR_DATASET.json"
dataset.save(dataset_path)
loaded_dataset = load_scm_dataset(dataset_path)

Cite Inga

If you use inga in academic work, you can cite it with the following BibTeX entry (and optionally replace year and note with the exact release tag/commit and access date you used):

@software{detommaso_inga,
  author = {Detommaso, Gianluca},
  title = {Inga: Causal Synthetic Tabular Data Toolkit},
  url = {https://github.com/gianlucadetommaso/inga},
  year = {2026},
  note = {GitHub repository}
}

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Feb 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inga-0.1.0.tar.gz (54.0 MB view details)

Uploaded Feb 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

inga-0.1.0-py3-none-any.whl (47.8 kB view details)

Uploaded Feb 17, 2026 Python 3

File details

Details for the file inga-0.1.0.tar.gz.

File metadata

Download URL: inga-0.1.0.tar.gz
Upload date: Feb 17, 2026
Size: 54.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for inga-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`05d660d4d259097b9fe511f714b6b69748a82a201589a4f4e3f6bbb130332385`
MD5	`ce01a34c095e6fd06d159547d26ca12f`
BLAKE2b-256	`09efdf1cfd90119cf8159d75e4117973f1238442030c6141015d50bc3bde7048`

See more details on using hashes here.

File details

Details for the file inga-0.1.0-py3-none-any.whl.

File metadata

Download URL: inga-0.1.0-py3-none-any.whl
Upload date: Feb 17, 2026
Size: 47.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for inga-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`291afa06f49488b0746b98f7c3ce3cf24f0e822f23f0eb717c8c2f3da69ec6c2`
MD5	`a3436f06f7fc427289a169c7da10731e`
BLAKE2b-256	`00d0ade662261aa107f96f235eb628178fb2256d2fb8cde4645cd8fc5f800a3a`

See more details on using hashes here.

inga 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Inga 因果

Causal Effect and Causal Bias

Causally Consistent Pre-Training

A Small Benchmark

How To:

Install

Create a DAG

Create a SCM

Compute causal effect and causal bias

Explore the dataset

Generate, save and load SCM datasets

Cite Inga

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes