A library for automated causal inference
Project description
CAIS - Causal AI Scientist
Causal AI Scientist (CAIS) is an LLM-powered tool for generating data-driven answers to natural language causal queries. It takes a natural language query (for example, "Does participating in a job training program lead to higher income?"), an accompanying dataset, and the corresponding description as inputs. CAIS then frames a suitable causal estimation problem by selecting appropriate treatment and outcome variables. It finds the suitable method for causal effect estimation, implements it, runs diagnostic tests, and finally interprets the numerical results in the context of the original query
🚀 Quick Start
Installation
pip install causal_agent
Basic Usage
from causal_agent import run_causal_analysis
# Run causal analysis with a simple question
result = run_causal_analysis(
query="What is the effect of education on income?",
dataset_path="your_data.csv",
dataset_description="Dataset containing education and income data"
)
print(f"Causal effect: {result['results']['results']['effect_estimate']}")
print(f"Method used: {result['results']['results']['method_used']}")
print(f"Explanation: {result['explanation']}")
Command Line Interface
# Single analysis
causal_agent run dataset.csv "What is the effect of treatment on outcome?"
# Batch analysis
causal_agent batch metadata.csv data_folder/ results.json
🔧 Setup
1. Configure LLM Provider
Set your API key for your preferred LLM provider:
import os
# OpenAI (default)
os.environ["OPENAI_API_KEY"] = "your-api-key"
# Or use Anthropic
os.environ["LLM_PROVIDER"] = "anthropic"
os.environ["ANTHROPIC_API_KEY"] = "your-api-key"
# Or use Google Gemini
os.environ["LLM_PROVIDER"] = "gemini"
os.environ["GOOGLE_API_KEY"] = "your-api-key"
2. Prepare Your Data
- CSV format with clear column names
- Include relevant variables for causal analysis
- Ensure sufficient sample size (typically >100 observations)
📊 What CAIS Does
- Parses your natural language causal question
- Analyzes your dataset structure and variables
- Selects the most appropriate causal inference method:
- Randomized Controlled Trials (RCT)
- Difference-in-Differences (DiD)
- Instrumental Variables (IV)
- Regression Discontinuity Design (RDD)
- Propensity Score Matching/Weighting
- Linear Regression with controls
- And more...
- Executes the analysis with proper diagnostics
- Interprets results in the context of your original question
🎯 Example Use Cases
Education Research
result = run_causal_analysis(
query="Does smaller class size improve student test scores?",
dataset_path="education_data.csv",
dataset_description="Student data with class sizes and test scores"
)
Healthcare
result = run_causal_analysis(
query="What is the effect of the new treatment on patient recovery time?",
dataset_path="clinical_trial_data.csv",
dataset_description="Randomized trial data comparing treatments"
)
Economics
result = run_causal_analysis(
query="How does minimum wage increase affect employment?",
dataset_path="employment_data.csv",
dataset_description="Employment data before and after policy change"
)
📈 Advanced Features
Batch Processing
Process multiple datasets at once:
import pandas as pd
# Create metadata file
metadata = pd.DataFrame({
'natural_language_query': [
'Effect of education on income',
'Impact of training on employment'
],
'data_files': ['education.csv', 'training.csv'],
'data_description': ['Education dataset', 'Training program data']
})
# Save metadata to CSV file first
metadata.to_csv('metadata.csv', index=False)
# Run batch analysis using CLI
# causal_agent batch metadata.csv ./data/ results.json
Custom LLM Configuration
# Use different models
os.environ["LLM_MODEL"] = "gpt-4o-mini" # Faster, cheaper
# os.environ["LLM_MODEL"] = "gpt-4" # More accurate
# os.environ["LLM_MODEL"] = "claude-3-haiku-20240307" # Anthropic
🔍 Understanding Results
CAIS returns structured results including:
- Effect Estimate: The causal effect size
- Standard Error: Uncertainty in the estimate
- Confidence Interval: Range of plausible values
- Method Used: Which causal inference technique was applied
- Variables Identified: Treatment, outcome, and control variables
- Explanation: Plain-language interpretation of results
result = run_causal_analysis(query, dataset_path, description)
# Access key results
effect = result['results']['results']['effect_estimate']
method = result['results']['results']['method_used']
variables = result['results']['variables']
explanation = result['explanation']
print(f"Using {method}, we found that {variables['treatment_variable']} "
f"has an effect of {effect} on {variables['outcome_variable']}")
🛠️ Supported Methods
CAIS automatically selects from:
- Experimental Methods: RCT analysis
- Quasi-Experimental: DiD, RDD, IV
- Observational: Propensity scoring, backdoor adjustment
- Machine Learning: Causal forests, double ML (coming soon)
📚 Best Practices
Writing Good Causal Questions
- ✅ Good: "What is the causal effect of education on income?"
- ✅ Good: "Does job training increase employment rates?"
- ❌ Avoid: "Are education and income related?" (correlation, not causation)
Dataset Requirements
- Clear variable names
- Sufficient sample size
- Relevant control variables
- Clean data (handle missing values)
Providing Context
Include dataset descriptions with:
- Variable definitions
- Data collection method
- Time period covered
- Known confounders
🔄 Migration from Previous Versions
If you're upgrading from the old cais package, see our Migration Guide for step-by-step instructions.
Quick update:
pip uninstall cais
pip install causal-agent
Then update your imports:
# Old
from cais import run_causal_analysis
# New
from causal_agent import run_causal_analysis
🤝 Support
- Documentation: GitHub README
- Migration Guide: MIGRATION.md
- Issues: GitHub Issues
- Examples: Check the test examples
📄 License
MIT License - see LICENSE for details.
Citation
If you use CAIS in your research, please cite:
@software{causal_agent2025,
title={CAIS: Causal AI Scientist for Automated Causal Inference},
author={Verma, Vishal and Acharya, Sawal and Simko, Samuel and Bhardwaj, Devansh and Haghighat, Anahita and Jin, Zhijing},
year={2025},
url={https://github.com/causalNLP/causal-agent}
}
Get started with causal inference in minutes, not hours! 🎉
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file causal_agent-0.1.2.tar.gz.
File metadata
- Download URL: causal_agent-0.1.2.tar.gz
- Upload date:
- Size: 294.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69e098b6799c5517c51198c03257afe96802d7d910d26434406788b591240896
|
|
| MD5 |
2f6096c9f97e1389a84770a007aeb87f
|
|
| BLAKE2b-256 |
fa93195945b3e5b975a5fe44ce2aca0707f09432c64d4857442c65536d722977
|
File details
Details for the file causal_agent-0.1.2-py3-none-any.whl.
File metadata
- Download URL: causal_agent-0.1.2-py3-none-any.whl
- Upload date:
- Size: 341.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4df126fa48571ca28e6444d48a011b4b2a6ebe00508552e2a041fb462a21d7bd
|
|
| MD5 |
a0d58a0236298b18116a25b820502587
|
|
| BLAKE2b-256 |
35daf628128b826c66e52d42a5036ced933cab4f591004411b13dea99bbc6ec3
|