Large Language Models (LLMs) with Bayesian causal inference to discover causal relationships and associations from observational data and domain knowledge

These details have not been verified by PyPI

Project links

Project description

Causal Inference Framework for AWS (causalif)

Overview
Logical Flow
Why Hill Climb and BDeu Score?
Prerequisites
Installation
Usage Examples
Architecture
Limitations
Contributing
License

Overview

CausalIF combines LLMs with Bayesian causal inference to discover causal relationships from both qualitative documents and quantitative data. It leverages:

Background Knowledge: LLM's pre-trained causal understanding
Document Knowledge: Domain documents via RAG retrieval
Bayesian Structure Learning: Hill Climbing + BDeu scoring for causal orientation
Do-Calculus: Interventional queries via pgmpy's do-operator (causalif_intervene)

Best used as a tool in agentic systems for interpreting causal relationships.

GitHub: awslabs/causalif | PyPI: causalif

The direct, indirect and independent association algorithm (causalif_1_edge_existence_verification) is inspired by LACR 1 algorithm: https://arxiv.org/html/2402.15301v2

Note: It is an experimental project which is dependent on quality RAG documents, model knowledge and data size for its analysis.

Ideal Use Cases

CausalIF works best when you have both qualitative domain knowledge and quantitative observational data.

What You Need:

Qualitative: Documents with formulae, relationships, and domain expertise
Quantitative: Observational data (even if noisy)

Example: Financial institution analyzing derived metrics using research papers + historical market data.

When to Use: ✅ Rich document corpus + observational data
✅ Understanding derived metrics
✅ "What causes what" questions

When Not to Use: ⚠️ No domain documents
⚠️ Real-time requirements
⚠️ <100 data samples
⚠️ Purely experimental data (use RCTs)

Logical Flow

CausalIF implements a 3-stage algorithm:

Library Architecture

Stage 1: Edge Existence (CausalIF 1)

Goal: Identify direct causal associations

5 Phases:

Document Retrieval: Get k_documents from RAG per edge
Association Verification: LLM votes (1 BG + k DOC votes per edge) → Associated/Independent/Unknown
Type Classification: Direct/Indirect/Unknown for associated edges
Rechecker: Validate intermediaries are in variable set V; reclassify if not
Vote Scoring: Direct: +1, Indirect/Independent: -1, Unknown: 0 → Keep if S > 0

Output: Skeleton graph with only direct associations

Stage 2: Causal Orientation (CausalIF 2)

Goal: Determine causal direction (A → B or B ← A) and validate edge robustness

Process:

Hill Climbing + BDeu: Orient skeleton edges using PriorWeightedBDeu scoring on observational data
Bootstrap Stability: Resample data N times (default 50), re-run Hill Climb on each resample, compute per-edge directed stability (% of resamples where exact edge direction appeared)
Pruning: Remove edges with directed stability below threshold (default 70%)

Output: Directed Acyclic Graph (DAG) with bootstrap-validated edges

Stage 3: Causal Inference (Optional)

Goal: Quantify causal effects, enable interventional queries, and annotate edges with probabilities

Process: Fit CPDs → Compute Average Treatment Effects (ATE) for all edges → Direction analysis → Prune negligible edges → Enable do-operator queries

Enable with: enable_causal_estimate=True

Edge Probability Labels: After fitting the causal model, the do-operator computes P(effect | do(cause=value)) for every directed edge. Each edge is annotated with:

P=value: The ATE (max probability shift) — how much the effect's distribution changes when you intervene on the cause
↑: Directly related (increasing cause increases effect)
↓: Inversely related (increasing cause decreases effect)
→: Neutral (no significant directional shift)

Edge Pruning: Edges with ATE < 0.01 are removed — if the do-operator shows no measurable interventional effect, the edge is considered noise from structure learning.

Direction Fallback: For categorical effects where numeric direction can't be computed, the system compares most-likely states under low vs high intervention to determine if the distribution shifts.

The do-operator uses pgmpy's backdoor adjustment. It only works in the causal direction (ancestor → descendant); querying the reverse returns a helpful error with a suggestion.

Factor Descriptions (Recommended)

After Bayesian orientation, the LLM verifies edge directions using domain semantics (Step 2b). For this to work accurately, the LLM needs to understand what each column name means. Pass a factor_descriptions string with column definitions:

# Factor Definitions
- product_cogs_amt: cost of goods sold for the product in USD
- asin_weight_kg: physical weight of the product in kilograms
- gms: gross merchandise sales value
- asp: average selling price per unit
- ship_method: shipping method used (ground, air, express)
- weather_condition: weather at time of shipment (clear, rain, snow)

Store this file in S3 and pass the URI directly:

set_causalif_engine(
    ...
    factor_descriptions="s3://my-data-bucket/causalif/factor_descriptions.md",
)

You can also pass the content inline as a string. If factor_descriptions is not provided, a warning is emitted that causal directions may be misrepresented due to ambiguous column names.

Why Hill Climb and BDeu Score?

Hill Climbing

Local search algorithm that iteratively improves graph structure. Advantages: incorporates prior knowledge, computationally efficient (10-20 variables), interpretable steps.

BDeu Score

Bayesian scoring function measuring how well a graph explains data. Advantages: combines priors with data, score equivalence, built-in regularization.

CausalIF Enhancement: Score(G) = BDeu(G | Data) + λ × Prior(G | LLM), validated by bootstrap stability

Implements Bayesian inference: P(G | Data, LLM) ∝ P(Data | G) × P(G | LLM)

Prerequisites

AWS Bedrock Knowledge Base: Setup guide
LLM Model: Any LangChain-compatible LLM (Bedrock, OpenAI, etc.)
Observational Data: Pandas DataFrame with 100+ samples

Quick Setup

from langchain_aws.retrievers import AmazonKnowledgeBasesRetriever
from langchain_aws import ChatBedrockConverse

# Retriever
retriever = AmazonKnowledgeBasesRetriever(
    knowledge_base_id="your-kb-id",
    retrieval_config={"vectorSearchConfiguration": {"numberOfResults": 20}}
)

# LLM
model = ChatBedrockConverse(
    model_id="global.anthropic.claude-sonnet-4-6",
    temperature=0.0,
    region_name="us-west-2"
)

Installation

pip install causalif

Usage Examples

Basic Usage

from causalif import set_causalif_engine, causalif_tool, visualize_causalif_results
from langchain_aws import ChatBedrockConverse
import pandas as pd

# 1. Prepare your data
df = pd.DataFrame({
    'sleep_hours': [7, 6, 8, 5, 7, 9, 6, 8, 7, 5],
    'exercise_minutes': [30, 20, 45, 10, 35, 60, 25, 50, 40, 15],
    'stress_level': [5, 7, 3, 8, 4, 2, 6, 3, 5, 8],
    'productivity': [8, 6, 9, 4, 7, 10, 6, 9, 8, 5]
})

# 2. Initialize LLM
model=ChatBedrockConverse(model_id="<model_id>",temperature=0.0,region_name="<region_id>")

# 3. Configure Causalif engine
# Configure with financial data

# Load factor descriptions from S3 (recommended for accurate causal direction detection)
# Just pass the S3 URI directly — causalif fetches it automatically
factor_descriptions = "s3://my-bucket/causalif/factor_descriptions.md"

set_causalif_engine(
            model=<your_bedrock_model>,
            retriever_tool=retriever_tool,
            dataframe=<dataframe_name>, 
            max_degrees=<degree of edges>,  # None = no filtering (show entire graph), or set to int (e.g., 2) to filter.
            max_parallel_queries=50, #This is variable but the code is tested with 50.
            excluded_target_columns=None, # This a list of factors that shouldn't be target columns
            excluded_related_columns=None, # This a list of factors that shouldn't be related columns
            related_factors=None,  # Add custom related factors here (will be appended with dataframe columns). Mostly derived columns from documents
            selected_dataframe_columns=None, # list of columns from your dataframe if you dont want the whole dataframe to be analyzed.
            enable_causal_estimate = True,  #Causal inference to find upstream or downstream direct effects of the target factor.
            domains = <list of industry domains>, # Consider this mandatory for the model to apply adequate background knowledge
            bootstrap_iterations=50, # Number of bootstrap resamples for edge stability validation (0 to disable)
            bootstrap_threshold=0.7, # Prune edges with directed stability below this threshold
            factor_descriptions=factor_descriptions, # Column definitions for LLM direction verification (see below)
        )

# 4. Run causal analysis
result = causalif.causalif("<query>") # example: Why is interest_rate so low in week 3?

# 5. Visualize results
fig = visualize_causalif_results(result)
fig.show()

Query Formats

Causalif supports natural language queries in various formats. The <target_factor> is the column or factor whose dependencies with other variables you want to analyze:

"""
Allowed query formats (where <target_factor> is the variable to analyze):

1. why (is|are) <target_factor> so (low|high|poor|bad|good)
2. what (causes|affects|influences) <target_factor>
3. <target_factor> (is|are) too (low|high)
4. analyze the causes (of|for) <target_factor>
5. dependencies (of|for) <target_factor>
6. factors (affecting|influencing) <target_factor>
"""

# Format 1: Why questions
result = causalif.causalif("Why is stress_level so high?")
result = causalif.causalif("Why are sales so low?")

# Format 2: What causes questions
result = causalif.causalif("What causes low productivity?")
result = causalif.causalif("What affects customer satisfaction?")

# Format 3: Direct statements
result = causalif.causalif("productivity is too low")
result = causalif.causalif("revenue is too high")

# Format 4: Analysis requests
result = causalif.causalif("analyze the causes of high stress_level")
result = causalif.causalif("analyze the causes for poor performance")

# Format 5: Dependency queries
result = causalif.causalif("dependencies of productivity")
result = causalif.causalif("dependencies for stock_price")

# Format 6: Factor influence queries
result = causalif.causalif("factors affecting sleep_hours")
result = causalif.causalif("factors influencing market_volatility")

Interventional Queries (do-operator)

Once the causal model is fitted (enable_causal_estimate=True and a causal discovery query has been run), you can ask interventional questions using causalif_intervene:

from causalif import causalif_intervene

"""
Allowed intervention formats (where X is cause, Y is effect):

1. what happens to Y if X is (high|low|medium)
2. what would Y be if X is (high|low|medium)
3. how does Y change if X is (high|low|medium)
4. effect of setting X to (high|low|medium) on Y
5. what happens to Y if X is (high|low|medium) and Z is (high|low|medium)
"""

# Format 1: What happens questions
result = causalif_intervene("what happens to asp if our_price is high")
print(result['summary'])

# Format 2: What would questions
result = causalif_intervene("what would productivity be if stress_level is low")

# Format 3: How does questions
result = causalif_intervene("how does revenue change if marketing_spend is high")

# Format 4: Effect of setting
result = causalif_intervene("effect of setting interest_rate to low on bond_price")

# Format 5: Multiple interventions
result = causalif_intervene("what happens to Y if X is low and Z is high")

Note: The do-operator only works in the causal direction. If A → B in the graph, you can query do(A) on B, but not do(B) on A.

Visualization Features

The interactive visualization includes:

Node Colors: Degree of separation from target factor (red = direct, blue = distant)
Edge Colors: Same color scheme as nodes
Arrows: Direction of causality
Hover Information: Detailed relationship information
Interactive: Zoom, pan, and click for details

fig = visualize_causalif_results(result)

Architecture

Overall Architecture

Layers: Agent → CausalIF Tool → Engine → Knowledge (RAG + LLM) → Data

Components:

causalif/
├── core.py           # Data structures
├── engine.py         # CausalIF algorithm
├── prompts.py        # LLM prompts
├── tool.py           # API & LangChain integration
└── visualization.py  # Plotly graphs

Limitations

Not ideal for: Pure quantitative data or feedback-loop driven inference. Built for hybrid qualitative + quantitative analysis.

Data: Min 100 samples recommended, 10-20 variables max run at a time, Complexity is O(n² × k)

LLM: May hallucinate, reflects training biases, 2-5 calls per variable pair

Assumptions: DAG structure (no cycles), no unmeasured confounders, conditional independence

Do-operator: Only works in causal direction (ancestor → descendant), not reverse

Mitigation: Use max_degrees for filtering, temperature=0 for consistency, validate with domain expertise

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Reporting Issues

Please report bugs and feature requests on GitHub Issues.

License

This project is licensed under the Apache-2.0 License. See LICENSE for details.

Version History

v0.1.10: Benchmarking module (causalif.benchmarks) for accuracy evaluation against ground-truth DAGs. Includes F1/precision/recall/SHD metrics, standard benchmark networks (ASIA, Sachs, ALARM), synthetic data generation, baseline comparison (PC, Hill Climb-BDeu), and sensitivity analysis. Average directed F1: 0.87 across 7 benchmarks.
v0.1.9.9: Dropdown filter for verified/unverified edges (hides arrows and labels when filtering), dashed edge labels for ATE=0 or failed edges, improved arrow positioning (closer to target nodes), spring layout restored for reliable arrow rendering on all edges.
v0.1.9.8: Do-operator (ATE) probabilities and direction labels on all graph edges, pgmpy 1.1+ API migration, adaptive edge pruning (ATE < 0.01 removed), LLM-based causal direction verification (Step 2b), factor_descriptions parameter for column definitions, improved node spacing in visualization.
v0.1.9.7: Improved numerical stability in discretization pipeline, refined prior contribution diagnostics, and adaptive graph visualization for larger causal structures.
v0.1.9.6: Bootstrap stability validation in CausalIF 2 (resample + re-run Hill Climb, prune edges below 70% directed stability).
v0.1.9.5: LACR 1 direct/indirect association algorithm, do-operator with direction analysis, interventional queries via causalif_intervene.
v0.1.9: Removed LLM-based causal directions, introduced Bayesian-based causal direction with Hill Climb search and immediate upstream/downstream effects. Hybrid graph with associations and causal directions.
v0.1.6: Removed directed graph dependencies, added example notebook.
v0.1.5: README updates.
v0.1.4: Base version with complete Causalif algorithm.

Support

Documentation: GitHub README
Issues: GitHub Issues
Email: bossubhr@amazon.co.uk

Acknowledgments

Built with:

LangChain - LLM orchestration
NetworkX - Graph algorithms
Plotly - Interactive visualization
AWS Bedrock - LLM and RAG infrastructure

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.10

Jun 4, 2026

0.1.9.9

May 26, 2026

0.1.9.8

May 16, 2026

0.1.9.7

Mar 20, 2026

0.1.9.6

Mar 19, 2026

0.1.9.5

Mar 3, 2026

0.1.9.4

Mar 3, 2026

0.1.9.3

Feb 23, 2026

0.1.9.2

Feb 23, 2026

0.1.9.1

Feb 20, 2026

0.1.6

Oct 21, 2025

0.1.5

Oct 20, 2025

0.1.4

Oct 20, 2025

0.1.3

Oct 20, 2025

0.1.0

Oct 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

causalif-0.1.10-py3-none-any.whl (85.9 kB view details)

Uploaded Jun 4, 2026 Python 3

File details

Details for the file causalif-0.1.10-py3-none-any.whl.

File metadata

Download URL: causalif-0.1.10-py3-none-any.whl
Upload date: Jun 4, 2026
Size: 85.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for causalif-0.1.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`173390f1cc03e7e3bb57b658ed4666fb2ae49b9ae869fe228cee69b148040cb3`
MD5	`c06228b5d98e664cc546422bd63adfc5`
BLAKE2b-256	`2695e8428a3295286ef76f3ae7d29fe67cc4ccd9fe5a4cec9d4bcfef2880068d`

See more details on using hashes here.

causalif 0.1.10

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Causal Inference Framework for AWS (causalif)

Table of Contents

Overview

Note: It is an experimental project which is dependent on quality RAG documents, model knowledge and data size for its analysis.

Ideal Use Cases

Logical Flow

Stage 1: Edge Existence (CausalIF 1)

Stage 2: Causal Orientation (CausalIF 2)

Stage 3: Causal Inference (Optional)

Factor Descriptions (Recommended)

Why Hill Climb and BDeu Score?

Hill Climbing

BDeu Score

Prerequisites

Quick Setup

Installation

Usage Examples

Basic Usage

Query Formats

Interventional Queries (do-operator)

Visualization Features

Architecture

Limitations

Contributing

Reporting Issues

License

Version History

Support

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes