Time-series Double Machine Learning for Fuzzy Cognitive Maps
Project description
causal_mm
Time-series Double Machine Learning for Fuzzy Cognitive Maps (FCMs). Estimates directed edge strengths, bounds them to [-1, +1] for visualization, and writes everything back into a single fcm_project.json bundle.
Why this exists
- Small, time-dependent datasets: forward-chaining DML handles temporal dependence and avoids look-ahead bias.
- Prior knowledge first: uses your drawn FCM as structural prior; estimates "how strong" each edge is instead of discovering a spaghetti graph.
- Stakeholder-friendly outputs: scaled weights in [-1, +1], optional uncertainty, and an adjacency matrix block ready for downstream tools.
What it does
- Load one
fcm_project.jsonthat holds meta, model, timeseries, settings, estimates, and optionalresults. - Per edge: build lagged controls, run forward-fold DML, compute
tau_raw, then scale withtanh(alpha_scale * tau_raw)to [-1, +1]. - Optional block bootstrap: confidence intervals (
ci_low,ci_high), standard errors, and sign-stability. - Persist results back into the JSON (estimates +
results.adjacency_matrix) and stampmeta.weights_computed_at/meta.weights_method. - Refresh-only helper:
recompute_adjacencyrewrites just the stored adjacency matrix from existing estimates. - Compare models:
scripts/compare_mental_models.pyreads the stored adjacency matrices and reports similarity + Generalized Distance Ratio (GDR).
Single-file project format (input/output)
Top-level keys in fcm_project.json:
meta: project metadata; will be updated with computation timestamp/method.model:concepts(integer IDs) andedges(source,target, optionalstakeholder_weight).timeseries:indexarray;datakeyed by stringified concept IDs; arrays aligned toindex.settings: passthrough config (currently optional, preserved).estimates: filled after estimation; keys like"src->tgt".results(optional):adjacency_matrixwithconcept_ids,matrix,weight_type. Rules: concept IDs are integers everywhere; JSON keys use their string form; all timeseries arrays share length;results.adjacency_matrixorder must matchconcept_idsso comparison scripts align correctly.
Installation
From PyPI (recommended)
pip install causal-mm
With optional econml backends:
pip install causal-mm[econml]
From GitHub
pip install git+https://github.com/skp703/causal-mm.git
From source (development)
git clone https://github.com/skp703/causal-mm.git
cd causal-mm
pip install -e .
GUI
Launch the interactive visualizer (part of the deployed package):
causal-mm-gui
Or from the source repository:
streamlit run gui/app.py
Using conda
conda env create -f environment.yml
conda activate causal-mm
pip install -e .
CLI
causal-mm-run --input path/to/model.fcm_project.json --output path/to/output.fcm_project.json \
--max-lag 3 --n-folds 3 --min-train-size 10 --alpha-scale 1.0 \
--outcome-model ridge --treatment-model ridge \
[--controls-selection all|connected] \
[--bootstrap --n-bootstrap 200 --block-size 5 --bs-n-jobs 1 --random-state 123] \
[--use-econml --econml-estimator linear_dml|causal_forest|ortho_forest]
Common variants:
- Smaller data, fewer controls:
--controls-selection connected - Uncertainty: add
--bootstrap --n-bootstrap 500 --block-size 5 - Nonlinear nuisances:
--outcome-model random_forest --treatment-model random_forest - EconML backend:
--use-econml --econml-estimator causal_forest
Defaults and how changing them affects results
max_lag(default 3): higher captures longer memory but increases features and variance; lower is safer for very short series. Must keepmin_train_size > max_lag.include_self_lags(default True, Python-only): turning off removes autoregressive terms for the target; speeds up and reduces variance but risks omitting genuine inertia.drop_initial_na(default True, Python-only): keeps only rows with full lag history; set False to retain earliest rows at the cost of introducing NaNs you must handle downstream.controls_selection(defaultall):allguards against omitted confounders;connectedkeeps only parents + self lags, reducing variance for tiny samples but assumes your graph is correct.outcome_model/treatment_model(defaultridge):ridgeis fast and stable.random_forest/gbmcapture nonlinearity but can overfit small data.lassoyields sparse linear models;linearis plain OLS.n_folds(default 3): more folds reduce bias but shrink train windows; with short series prefer 3. Ifn_foldstoo high relative to length you will drop too much data.min_train_size(default 10): enforce enough history before the first fold. Increase for more stable nuisance models; decrease only if data are extremely short.alpha_scale(default 1.0): larger values pushtanhtoward +/-1 faster (good for visualization saturation); smaller keeps weights more linear and comparable across runs.random_state(default 123): controls fold splits and bootstrap resampling; change for robustness checks or unset in Python to allow full randomness.bootstrap(default off): enable to gettau_se,ci_low/high,sign_stability. Adds ~n_bootstraptimes compute cost.n_bootstrap(default 200): more draws tighten CI at higher cost; for quick smoke tests use 50-100, for final reports 500+ if time allows.block_size(default 5): should match dependence length (e.g., 5 for annual data with medium persistence). Too small underestimates uncertainty; too large inflates variance.bs_n_jobs(default 1): increase for parallel bootstrap on multicore machines; beware memory use.use_econml(default False) andeconml_estimator(defaultlinear_dml): switch to leverage econml backends (causal_forest,ortho_forest) when you need built-in heterogeneity handling or alternative orthogonalization; compute cost rises notably for forests.
Changing options via CLI: pass the flag shown above. Changing Python-only options (include_self_lags, drop_initial_na, custom hyperparameters) requires constructing the config objects directly:
from causal_mm.config import LagConfig, MLModelConfig, DMLConfig
lag_cfg = LagConfig(max_lag=2, include_self_lags=False, drop_initial_na=True)
dml_cfg = DMLConfig(
lag_config=lag_cfg,
outcome_model=MLModelConfig("random_forest", {"n_estimators": 300, "random_state": 42}),
treatment_model=MLModelConfig("random_forest", {"n_estimators": 300, "random_state": 42}),
n_folds=4,
min_train_size=20,
alpha_scale=0.8,
controls_selection="connected",
)
Python API
from pathlib import Path
from causal_mm.config import LagConfig, MLModelConfig, DMLConfig, BootstrapConfig
from causal_mm.pipeline import run_estimation
input_path = Path("data/models/my_model.fcm_project.json")
output_path = Path("data/models/my_model_with_estimates.fcm_project.json")
lag_cfg = LagConfig(max_lag=3)
dml_cfg = DMLConfig(
lag_config=lag_cfg,
outcome_model=MLModelConfig("ridge", {"alpha": 1.0}),
treatment_model=MLModelConfig("ridge", {"alpha": 1.0}),
n_folds=3,
min_train_size=10,
alpha_scale=1.0,
)
bs_cfg = BootstrapConfig(n_bootstrap=200, block_size=5, random_state=123, n_jobs=1)
run_estimation(
input_path=input_path,
output_path=output_path,
dml_config=dml_cfg,
bootstrap_config=bs_cfg,
)
How the estimator works (and why)
- Lag all concepts up to
max_lag; drop initial rows if requested to avoid NA leakage. - Select controls:
all(default) keeps lags of every concept (more robust to omitted variables).connectedkeeps only parent + self lags (lower variance on tiny datasets).
- Forward-chaining cross-fitting (
n_folds,min_train_size) prevents peeking into the future. - Fit outcome and treatment models per fold; residualize; accumulate
tau_raw = sum(Y_tilde * T_tilde) / sum(T_tilde^2). - Scale with
tanh(alpha_scale * tau_raw)so weights stay in [-1, +1] for FCM visualization. - Uncertainty (optional): block bootstrap resamples contiguous blocks to preserve time dependence; reports
tau_se, percentileci_low/high, andsign_stability. - Metadata stamped:
lag_used,n_obs,computed_at,method(dmlorbootstrap-dml).
Outputs written to the project file
estimates["src->tgt"]:tau_raw,tau_se,ci_low,ci_high,sign_stability,scaled_weight,n_obs,lag_used,status,error_message,computed_at,method.meta.weights_computed_at(UTC ISO8601) andmeta.weights_method.results.adjacency_matrix:concept_ids(order) +matrixofscaled_weightvalues (weight_typeset toscaled_weight).
Comparing multiple mental models
python scripts/compare_mental_models.py data/models/a.json data/models/b.json
python scripts/compare_mental_models.py data/models/*.json --format csv --output temp/compare.csv
python scripts/compare_mental_models.py data/models/*.json --format json --precision 6
Reads each file's results.adjacency_matrix, aligns by concept ID, and reports similarity and GDR. Use CSV/JSON for dashboards.
Metrics and diagnostics available in code
generalized_distance_ratio(modified GDR) with tunable penalties.graph_complexity_metrics: N, edge count, density, hierarchy index.concept_centrality_metrics: in-degree, out-degree, total centrality.score_models/best_model: time-series CV for trying multiple ML models.compare_weights,weight_distance: adjacency-level comparisons.
Simulation
simulation.py builds an adjacency matrix from scaled_weight and simulates FCM dynamics using tanh, logistic, or identity activations.
Data requirements and pitfalls
- Timeseries arrays must align; concept IDs must match between
model.edgesandtimeseries.data(stringified). min_train_sizemust exceedmax_lagso folds have usable history.- Block bootstrap is recommended whenever you care about uncertainty in time-dependent data.
- Cycles are fine: time lags break simultaneity (A_t depends on B_{t-1}, B_t on A_{t-1}).
Interactive GUI
The gui/ directory contains a Streamlit-based visualizer for inspecting, analyzing, and creating fcm_project.json files in the browser.
Install GUI dependencies
pip install causal-mm[gui]
# or
pip install -r gui/requirements.txt
Launch
streamlit run gui/app.py
Opens at http://localhost:8501 with pages for:
| Page | What it shows |
|---|---|
| 🕸️ Network | Interactive directed graph (colored nodes, weighted edges, adjacency heatmap) |
| ⚖️ Weights | Edge-weight table, forest plot with 95% CI, sign-stability bars |
| 📈 Time Series | Multi-line Plotly charts, z-score toggle, correlation matrix |
| 📊 Metrics | Graph complexity KPIs, concept centrality, node-role distribution |
| ✏️ Editor | Create/edit concepts, edges, import CSV time series, export JSON |
See gui/README.md for full documentation.
Repository layout
src/causal_mm/: core package (config, data, fcm, dml, bootstrap, metrics, io, pipeline, simulation).gui/: Streamlit interactive visualizer (network graph, weights, time series, metrics, editor).scripts/: utilities (e.g.,compare_mental_models.py).tests/: unit tests.- Packaging:
pyproject.toml; console entry pointcausal-mm-run.
FAQ -- Why FCM instead of full causal discovery?
- Data efficiency: testing every possible edge on short time-series yields noise; using the FCM as prior constrains the search.
- Regularization through expertise: arrows you drew act as structural prior; estimator asks "how strong," not "does it exist."
- Trust: outputs stay interpretable and aligned with stakeholder mental models.
Tests
pip install -e .
pytest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file causal_mm-0.3.1.tar.gz.
File metadata
- Download URL: causal_mm-0.3.1.tar.gz
- Upload date:
- Size: 75.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
967752913560f168ef39a412e8d8149b0c426a870cbe2368f97f198f018b8add
|
|
| MD5 |
2fdeb24ba43dc5b92a7c00c031aba24b
|
|
| BLAKE2b-256 |
d4dd129955efa009c49900da3974f890b5490f3926994dc2f968e3e57153635b
|
File details
Details for the file causal_mm-0.3.1-py3-none-any.whl.
File metadata
- Download URL: causal_mm-0.3.1-py3-none-any.whl
- Upload date:
- Size: 65.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
342ab37e6c6fb23de0397c5b769d3d72aaeaeb2318996bc864620701937a3e09
|
|
| MD5 |
b4a13291c3df65284c329dab2e5463ed
|
|
| BLAKE2b-256 |
5fff65bcde53ab7448d353d107b56cf7e0ce09e9228053ee5a7d7cc97ab3288e
|