Skip to main content

A library for automated causal inference

Project description

CAIS - Causal AI Scientist

PyPI version Python 3.10+ License: MIT

Causal AI Scientist (CAIS) is an LLM-powered tool for generating data-driven answers to natural language causal queries. It takes a natural language query (for example, "Does participating in a job training program lead to higher income?"), an accompanying dataset, and the corresponding description as inputs. CAIS then frames a suitable causal estimation problem by selecting appropriate treatment and outcome variables. It finds the suitable method for causal effect estimation, implements it, runs diagnostic tests, and finally interprets the numerical results in the context of the original query

🚀 Quick Start

Installation

pip install causal_agent

Basic Usage

from causal_agent import run_causal_analysis

# Run causal analysis with a simple question
result = run_causal_analysis(
    query="What is the effect of education on income?",
    dataset_path="your_data.csv",
    dataset_description="Dataset containing education and income data"
)

print(f"Causal effect: {result['results']['results']['effect_estimate']}")
print(f"Method used: {result['results']['results']['method_used']}")
print(f"Explanation: {result['explanation']}")

Command Line Interface

# Single analysis
causal_agent run dataset.csv "What is the effect of treatment on outcome?"

# Batch analysis
causal_agent batch metadata.csv data_folder/ results.json

🔧 Setup

1. Configure LLM Provider

Set your API key for your preferred LLM provider:

import os

# OpenAI (default)
os.environ["OPENAI_API_KEY"] = "your-api-key"

# Or use Anthropic
os.environ["LLM_PROVIDER"] = "anthropic"
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"

# Or use Google Gemini
os.environ["LLM_PROVIDER"] = "gemini"
os.environ["GOOGLE_API_KEY"] = "your-api-key"

2. Prepare Your Data

  • CSV format with clear column names
  • Include relevant variables for causal analysis
  • Ensure sufficient sample size (typically >100 observations)

📊 What CAIS Does

  1. Parses your natural language causal question
  2. Analyzes your dataset structure and variables
  3. Selects the most appropriate causal inference method:
    • Randomized Controlled Trials (RCT)
    • Difference-in-Differences (DiD)
    • Instrumental Variables (IV)
    • Regression Discontinuity Design (RDD)
    • Propensity Score Matching/Weighting
    • Linear Regression with controls
    • And more...
  4. Executes the analysis with proper diagnostics
  5. Interprets results in the context of your original question

🎯 Example Use Cases

Education Research

result = run_causal_analysis(
    query="Does smaller class size improve student test scores?",
    dataset_path="education_data.csv",
    dataset_description="Student data with class sizes and test scores"
)

Healthcare

result = run_causal_analysis(
    query="What is the effect of the new treatment on patient recovery time?",
    dataset_path="clinical_trial_data.csv",
    dataset_description="Randomized trial data comparing treatments"
)

Economics

result = run_causal_analysis(
    query="How does minimum wage increase affect employment?",
    dataset_path="employment_data.csv",
    dataset_description="Employment data before and after policy change"
)

📈 Advanced Features

Batch Processing

Process multiple datasets at once:

import pandas as pd

# Create metadata file
metadata = pd.DataFrame({
    'natural_language_query': [
        'Effect of education on income',
        'Impact of training on employment'
    ],
    'data_files': ['education.csv', 'training.csv'],
    'data_description': ['Education dataset', 'Training program data']
})

# Save metadata to CSV file first
metadata.to_csv('metadata.csv', index=False)

# Run batch analysis using CLI
# causal_agent batch metadata.csv ./data/ results.json

Custom LLM Configuration

# Use different models
os.environ["LLM_MODEL"] = "gpt-4o-mini"  # Faster, cheaper
# os.environ["LLM_MODEL"] = "gpt-4"      # More accurate
# os.environ["LLM_MODEL"] = "claude-3-haiku-20240307"  # Anthropic

🔍 Understanding Results

CAIS returns structured results including:

  • Effect Estimate: The causal effect size
  • Standard Error: Uncertainty in the estimate
  • Confidence Interval: Range of plausible values
  • Method Used: Which causal inference technique was applied
  • Variables Identified: Treatment, outcome, and control variables
  • Explanation: Plain-language interpretation of results
result = run_causal_analysis(query, dataset_path, description)

# Access key results
effect = result['results']['results']['effect_estimate']
method = result['results']['results']['method_used']
variables = result['results']['variables']
explanation = result['explanation']

print(f"Using {method}, we found that {variables['treatment_variable']} "
      f"has an effect of {effect} on {variables['outcome_variable']}")

🛠️ Supported Methods

CAIS automatically selects from:

  • Experimental Methods: RCT analysis
  • Quasi-Experimental: DiD, RDD, IV
  • Observational: Propensity scoring, backdoor adjustment
  • Machine Learning: Causal forests, double ML (coming soon)

📚 Best Practices

Writing Good Causal Questions

  • Good: "What is the causal effect of education on income?"
  • Good: "Does job training increase employment rates?"
  • Avoid: "Are education and income related?" (correlation, not causation)

Dataset Requirements

  • Clear variable names
  • Sufficient sample size
  • Relevant control variables
  • Clean data (handle missing values)

Providing Context

Include dataset descriptions with:

  • Variable definitions
  • Data collection method
  • Time period covered
  • Known confounders

🔄 Migration from Previous Versions

If you're upgrading from the old cais package, see our Migration Guide for step-by-step instructions.

Quick update:

pip uninstall cais
pip install causal-agent

Then update your imports:

# Old
from cais import run_causal_analysis

# New  
from causal_agent import run_causal_analysis

🤝 Support

📄 License

MIT License - see LICENSE for details.

Citation

If you use CAIS in your research, please cite:

@software{causal_agent2025,
  title={CAIS: Causal AI Scientist for Automated Causal Inference},
  author={Verma, Vishal and Acharya, Sawal and Simko, Samuel and Bhardwaj, Devansh and Haghighat, Anahita and Jin, Zhijing},
  year={2025},
  url={https://github.com/causalNLP/causal-agent}
}

Get started with causal inference in minutes, not hours! 🎉

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causal_agent-0.1.2.tar.gz (294.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

causal_agent-0.1.2-py3-none-any.whl (341.1 kB view details)

Uploaded Python 3

File details

Details for the file causal_agent-0.1.2.tar.gz.

File metadata

  • Download URL: causal_agent-0.1.2.tar.gz
  • Upload date:
  • Size: 294.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for causal_agent-0.1.2.tar.gz
Algorithm Hash digest
SHA256 69e098b6799c5517c51198c03257afe96802d7d910d26434406788b591240896
MD5 2f6096c9f97e1389a84770a007aeb87f
BLAKE2b-256 fa93195945b3e5b975a5fe44ce2aca0707f09432c64d4857442c65536d722977

See more details on using hashes here.

File details

Details for the file causal_agent-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: causal_agent-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 341.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for causal_agent-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4df126fa48571ca28e6444d48a011b4b2a6ebe00508552e2a041fb462a21d7bd
MD5 a0d58a0236298b18116a25b820502587
BLAKE2b-256 35daf628128b826c66e52d42a5036ced933cab4f591004411b13dea99bbc6ec3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page