Large Language Models (LLMs) with Bayesian causal inference to discover causal relationships and associations from observational data and domain knowledge
Project description
Causal Inference Framework for AWS (causalif)
Table of Contents
- Overview
- Logical Flow
- Why Hill Climb and BDeu Score?
- Prerequisites
- Installation
- Usage Examples
- Architecture
- Limitations
- Contributing
- License
Overview
CausalIF combines LLMs with Bayesian causal inference to discover causal relationships from both qualitative documents and quantitative data. It leverages:
- Background Knowledge: LLM's pre-trained causal understanding
- Document Knowledge: Domain documents via RAG retrieval
- Bayesian Structure Learning: Hill Climbing + BDeu scoring for causal orientation
- Do-Calculus: Interventional queries via pgmpy's do-operator (
causalif_intervene)
Best used as a tool in agentic systems for interpreting causal relationships.
GitHub: awslabs/causalif | PyPI: causalif
The direct, indirect and independent association algorithm (causalif_1_edge_existence_verification) is inspired by LACR 1 algorithm: https://arxiv.org/html/2402.15301v2
Note: It is an experimental project which is dependent on quality RAG documents, model knowledge and data size for its analysis.
Ideal Use Cases
CausalIF works best when you have both qualitative domain knowledge and quantitative observational data.
What You Need:
- Qualitative: Documents with formulae, relationships, and domain expertise
- Quantitative: Observational data (even if noisy)
Example: Financial institution analyzing derived metrics using research papers + historical market data.
When to Use:
✅ Rich document corpus + observational data
✅ Understanding derived metrics
✅ "What causes what" questions
When Not to Use:
⚠️ No domain documents
⚠️ Real-time requirements
⚠️ <100 data samples
⚠️ Purely experimental data (use RCTs)
Logical Flow
CausalIF implements a 3-stage algorithm:
Stage 1: Edge Existence (CausalIF 1)
Goal: Identify direct causal associations
5 Phases:
- Document Retrieval: Get k_documents from RAG per edge
- Association Verification: LLM votes (1 BG + k DOC votes per edge) → Associated/Independent/Unknown
- Type Classification: Direct/Indirect/Unknown for associated edges
- Rechecker: Validate intermediaries are in variable set V; reclassify if not
- Vote Scoring: Direct: +1, Indirect/Independent: -1, Unknown: 0 → Keep if S > 0
Output: Skeleton graph with only direct associations
Stage 2: Causal Orientation (CausalIF 2)
Goal: Determine causal direction (A → B or B ← A) and validate edge robustness
Process:
- Hill Climbing + BDeu: Orient skeleton edges using
PriorWeightedBDeuscoring on observational data - Bootstrap Stability: Resample data N times (default 50), re-run Hill Climb on each resample, compute per-edge directed stability (% of resamples where exact edge direction appeared)
- Pruning: Remove edges with directed stability below threshold (default 70%)
Output: Directed Acyclic Graph (DAG) with bootstrap-validated edges
Stage 3: Causal Inference (Optional)
Goal: Quantify causal effects, enable interventional queries, and annotate edges with probabilities
Process: Fit CPDs → Compute Average Treatment Effects (ATE) for all edges → Direction analysis → Prune negligible edges → Enable do-operator queries
Enable with: enable_causal_estimate=True
Edge Probability Labels: After fitting the causal model, the do-operator computes P(effect | do(cause=value)) for every directed edge. Each edge is annotated with:
- P=value: The ATE (max probability shift) — how much the effect's distribution changes when you intervene on the cause
- ↑: Directly related (increasing cause increases effect)
- ↓: Inversely related (increasing cause decreases effect)
- →: Neutral (no significant directional shift)
Edge Pruning: Edges with ATE < 0.01 are removed — if the do-operator shows no measurable interventional effect, the edge is considered noise from structure learning.
Direction Fallback: For categorical effects where numeric direction can't be computed, the system compares most-likely states under low vs high intervention to determine if the distribution shifts.
The do-operator uses pgmpy's backdoor adjustment. It only works in the causal direction (ancestor → descendant); querying the reverse returns a helpful error with a suggestion.
Why Hill Climb and BDeu Score?
Hill Climbing
Local search algorithm that iteratively improves graph structure. Advantages: incorporates prior knowledge, computationally efficient (10-20 variables), interpretable steps.
BDeu Score
Bayesian scoring function measuring how well a graph explains data. Advantages: combines priors with data, score equivalence, built-in regularization.
CausalIF Enhancement: Score(G) = BDeu(G | Data) + λ × Prior(G | LLM), validated by bootstrap stability
Implements Bayesian inference: P(G | Data, LLM) ∝ P(Data | G) × P(G | LLM)
Prerequisites
- AWS Bedrock Knowledge Base: Setup guide
- LLM Model: Any LangChain-compatible LLM (Bedrock, OpenAI, etc.)
- Observational Data: Pandas DataFrame with 100+ samples
Quick Setup
from langchain_aws.retrievers import AmazonKnowledgeBasesRetriever
from langchain_aws import ChatBedrockConverse
# Retriever
retriever = AmazonKnowledgeBasesRetriever(
knowledge_base_id="your-kb-id",
retrieval_config={"vectorSearchConfiguration": {"numberOfResults": 20}}
)
# LLM
model = ChatBedrockConverse(
model_id="global.anthropic.claude-sonnet-4-6",
temperature=0.0,
region_name="us-west-2"
)
Installation
pip install causalif
Usage Examples
Basic Usage
from causalif import set_causalif_engine, causalif_tool, visualize_causalif_results
from langchain_aws import ChatBedrockConverse
import pandas as pd
# 1. Prepare your data
df = pd.DataFrame({
'sleep_hours': [7, 6, 8, 5, 7, 9, 6, 8, 7, 5],
'exercise_minutes': [30, 20, 45, 10, 35, 60, 25, 50, 40, 15],
'stress_level': [5, 7, 3, 8, 4, 2, 6, 3, 5, 8],
'productivity': [8, 6, 9, 4, 7, 10, 6, 9, 8, 5]
})
# 2. Initialize LLM
model=ChatBedrockConverse(model_id="<model_id>",temperature=0.0,region_name="<region_id>")
# 3. Configure Causalif engine
# Configure with financial data
set_causalif_engine(
model=<your_bedrock_model>,
retriever_tool=retriever_tool,
dataframe=<dataframe_name>,
max_degrees=<degree of edges>, # None = no filtering (show entire graph), or set to int (e.g., 2) to filter.
max_parallel_queries=50, #This is variable but the code is tested with 50.
excluded_target_columns=None, # This a list of factors that shouldn't be target columns
excluded_related_columns=None, # This a list of factors that shouldn't be related columns
related_factors=None, # Add custom related factors here (will be appended with dataframe columns). Mostly derived columns from documents
selected_dataframe_columns=None, # list of columns from your dataframe if you dont want the whole dataframe to be analyzed.
enable_causal_estimate = True, #Causal inference to find upstream or downstream direct effects of the target factor.
domains = <list of industry domains>, # Consider this mandatory for the model to apply adequate background knowledge
bootstrap_iterations=50, # Number of bootstrap resamples for edge stability validation (0 to disable)
bootstrap_threshold=0.7, # Prune edges with directed stability below this threshold
)
# 4. Run causal analysis
result = causalif.causalif("<query>") # example: Why is interest_rate so low in week 3?
# 5. Visualize results
fig = visualize_causalif_results(result)
fig.show()
Query Formats
Causalif supports natural language queries in various formats. The <target_factor> is the column or factor whose dependencies with other variables you want to analyze:
"""
Allowed query formats (where <target_factor> is the variable to analyze):
1. why (is|are) <target_factor> so (low|high|poor|bad|good)
2. what (causes|affects|influences) <target_factor>
3. <target_factor> (is|are) too (low|high)
4. analyze the causes (of|for) <target_factor>
5. dependencies (of|for) <target_factor>
6. factors (affecting|influencing) <target_factor>
"""
# Format 1: Why questions
result = causalif.causalif("Why is stress_level so high?")
result = causalif.causalif("Why are sales so low?")
# Format 2: What causes questions
result = causalif.causalif("What causes low productivity?")
result = causalif.causalif("What affects customer satisfaction?")
# Format 3: Direct statements
result = causalif.causalif("productivity is too low")
result = causalif.causalif("revenue is too high")
# Format 4: Analysis requests
result = causalif.causalif("analyze the causes of high stress_level")
result = causalif.causalif("analyze the causes for poor performance")
# Format 5: Dependency queries
result = causalif.causalif("dependencies of productivity")
result = causalif.causalif("dependencies for stock_price")
# Format 6: Factor influence queries
result = causalif.causalif("factors affecting sleep_hours")
result = causalif.causalif("factors influencing market_volatility")
Interventional Queries (do-operator)
Once the causal model is fitted (enable_causal_estimate=True and a causal discovery query has been run), you can ask interventional questions using causalif_intervene:
from causalif import causalif_intervene
"""
Allowed intervention formats (where X is cause, Y is effect):
1. what happens to Y if X is (high|low|medium)
2. what would Y be if X is (high|low|medium)
3. how does Y change if X is (high|low|medium)
4. effect of setting X to (high|low|medium) on Y
5. what happens to Y if X is (high|low|medium) and Z is (high|low|medium)
"""
# Format 1: What happens questions
result = causalif_intervene("what happens to asp if our_price is high")
print(result['summary'])
# Format 2: What would questions
result = causalif_intervene("what would productivity be if stress_level is low")
# Format 3: How does questions
result = causalif_intervene("how does revenue change if marketing_spend is high")
# Format 4: Effect of setting
result = causalif_intervene("effect of setting interest_rate to low on bond_price")
# Format 5: Multiple interventions
result = causalif_intervene("what happens to Y if X is low and Z is high")
Note: The do-operator only works in the causal direction. If A → B in the graph, you can query do(A) on B, but not do(B) on A.
Visualization Features
The interactive visualization includes:
- Node Colors: Degree of separation from target factor (red = direct, blue = distant)
- Edge Colors: Same color scheme as nodes
- Arrows: Direction of causality
- Hover Information: Detailed relationship information
- Interactive: Zoom, pan, and click for details
fig = visualize_causalif_results(result)
Architecture
Layers: Agent → CausalIF Tool → Engine → Knowledge (RAG + LLM) → Data
Components:
causalif/
├── core.py # Data structures
├── engine.py # CausalIF algorithm
├── prompts.py # LLM prompts
├── tool.py # API & LangChain integration
└── visualization.py # Plotly graphs
Limitations
Not ideal for: Pure quantitative data or feedback-loop driven inference. Built for hybrid qualitative + quantitative analysis.
Data: Min 100 samples recommended, 10-20 variables max run at a time, Complexity is O(n² × k)
LLM: May hallucinate, reflects training biases, 2-5 calls per variable pair
Assumptions: DAG structure (no cycles), no unmeasured confounders, conditional independence
Do-operator: Only works in causal direction (ancestor → descendant), not reverse
Mitigation: Use max_degrees for filtering, temperature=0 for consistency, validate with domain expertise
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
Reporting Issues
Please report bugs and feature requests on GitHub Issues.
License
This project is licensed under the Apache-2.0 License. See LICENSE for details.
Version History
- v0.1.9.8: Do-operator (ATE) probabilities and direction labels on all graph edges, pgmpy 1.1+ API migration, adaptive edge pruning (ATE < 0.01 removed), improved node spacing in visualization.
- v0.1.9.7: Improved numerical stability in discretization pipeline, refined prior contribution diagnostics, and adaptive graph visualization for larger causal structures.
- v0.1.9.6: Bootstrap stability validation in CausalIF 2 (resample + re-run Hill Climb, prune edges below 70% directed stability).
- v0.1.9.5: LACR 1 direct/indirect association algorithm, do-operator with direction analysis, interventional queries via
causalif_intervene. - v0.1.9: Removed LLM-based causal directions, introduced Bayesian-based causal direction with Hill Climb search and immediate upstream/downstream effects. Hybrid graph with associations and causal directions.
- v0.1.6: Removed directed graph dependencies, added example notebook.
- v0.1.5: README updates.
- v0.1.4: Base version with complete Causalif algorithm.
Support
- Documentation: GitHub README
- Issues: GitHub Issues
- Email: bossubhr@amazon.co.uk
Acknowledgments
Built with:
- LangChain - LLM orchestration
- NetworkX - Graph algorithms
- Plotly - Interactive visualization
- AWS Bedrock - LLM and RAG infrastructure
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file causalif-0.1.9.8-py3-none-any.whl.
File metadata
- Download URL: causalif-0.1.9.8-py3-none-any.whl
- Upload date:
- Size: 71.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4886ef979d10f57156448e14e896093a2cb8f81b7cae3f7708f3af2c0035a02d
|
|
| MD5 |
1ef299175670dac795c142ad7854a46d
|
|
| BLAKE2b-256 |
3fe7809f88f5a3493d3963e32973099e2a95dd4dc91d5cf970fe0e0db75026c8
|