Incorporating LLM and human knowledge into causal discovery
Project description
causaliq-knowledge
The CausalIQ Knowledge project represents a novel approach to causal discovery by combining the traditional statistical structure learning algorithms with the contextual understanding and reasoning capabilities of Large Language Models. This integration enables more interpretable, domain-aware, and human-friendly causal discovery workflows. It is part of the CausalIQ ecosystem for intelligent causal discovery.
Status
🚧 Active Development - this repository is currently in active development, which involves:
- Adding new knowledge features, in particular knowledge from LLMs
- Migrating functionality which provides knowledge based on standard reference networks from the legacy monolithic discovery repo
- Ensuring CausalIQ development standards are met
Quick Start
from causaliq_knowledge.llm import LLMKnowledge
# Query an LLM about a potential causal relationship
knowledge = LLMKnowledge(models=["groq/llama-3.1-8b-instant"])
result = knowledge.query_edge("smoking", "lung_cancer")
print(f"Exists: {result.exists}, Direction: {result.direction}")
print(f"Confidence: {result.confidence}")
print(f"Reasoning: {result.reasoning}")
Features
✅ Currently implemented releases:
-
v0.1.0 - Foundation LLM [January 2026]: Foundation release establishing LLM client infrastructure for causal graph generation.
-
v0.2.0 - Additional LLMs [January 2026]: Expanded LLM provider support from 2 to 7 providers.
-
v0.3.0 - LLM Caching [January 2026]: SQLite-based response caching with CLI tools for cache management.
-
v0.4.0 - Graph Generation [February 2026]: CLI tools and CausalIQ workflows for LLM-generated causal graphs.
-
v0.5.0 - Workflow Integration [February 2026]: Integration into CausalIQ Workflows including writing results to cache.
-
v0.6.0 - PDG Generation [April 2026]: PDG output with separate existence and orientation probabilities, multi-sampling support, and improved LLM response handling.
🛣️ Upcoming Releases (speculative)
-
Release v0.7.0 - LLM Provider Cost Tracking: Query LLM provider APIs for usage and cost statistics.
-
Release v0.8.0 - Enhanced LLM Context: Background literature supplied to LLMs
-
Release v0.9.0 - Legacy Reference: Support for deriving knowledge from reference networks and migration of functionality from legacy discovery repo
Implementation Approach
Technology Stack
- Vendor-Specific API Clients: Direct integration with LLM providers using httpx
- Pydantic: Structured response validation
- Click: Command-line interface
Why Vendor-Specific APIs (not LiteLLM/LangChain)?
We use direct vendor-specific API clients rather than wrapper libraries:
| Aspect | Direct APIs | Wrapper Libraries |
|---|---|---|
| Reliability | ✅ Full control | ❌ Wrapper bugs |
| Dependencies | ✅ Minimal (httpx) | ❌ Heavy (~50-100MB) |
| Debugging | ✅ Clear traces | ❌ Abstraction layers |
| Maintenance | ✅ We control | ❌ Wait for updates |
This approach keeps the package lightweight, reliable, and easy to debug.
Supported LLM Providers
| Provider | Client | Models | Free Tier |
|---|---|---|---|
| Groq | GroqClient |
llama-3.1-8b-instant | ✅ Generous |
| Google Gemini | GeminiClient |
gemini-2.5-flash | ✅ Generous |
| OpenAI | OpenAIClient |
gpt-4o-mini | ❌ Paid |
| Anthropic | AnthropicClient |
claude-sonnet-4-20250514 | ❌ Paid |
| DeepSeek | DeepSeekClient |
deepseek-chat | ✅ Low cost |
| Mistral | MistralClient |
mistral-small-latest | ❌ Paid |
| Ollama | OllamaClient |
llama3 | ✅ Free (local) |
Upcoming Key Innovations
🧠 LLMs support Causal Discovery and Inference
- Initially LLM will work with graph averaging to resolve uncertain edges (use entropy to decide edges with uncertain existence or direction)
- Integration into structure learning algorithms to provide knowledge for "uncertain" areas of the graph
- LLMs analyse learning process and errors to suggest improved algorithms
- LLMs used to preprocess text and visual data so they can be used as inputs to structure learning
🤝 Human Engagement
- Natural language constraints: Specify domain knowledge in plain English
- Expert knowledge incorporation by converting expert understanding into algorithmic constraints
- LLMs convert natural language questions to causal queries
- Interactive causal discovery where structure learning or LLMs identify areas of causal uncertainty and can test causal hypotheses through dialogue
🪟 Transparency and interpretability
- LLMs interpret structure learning process and outputs, including their uncertainties
- LLMs interpret causal inference results including uncertainties
- Contextual graph interpretation to explain variable meanings and relationships
- Uncertainty communication with clear explanation of confidence levels and limitations
- Report generation including automated research summaries and methodology descriptions
🔒 Stability and reproducibility
- Cache queries and responses so that experiments are stable and repeatable even if LLMs themselves are not
- Stable randomisation of e.g. data sub-sampling
💰 Efficient use of LLM resources (important as an independent researcher)
- Cache queries and results so that knowledge can be re-used
- Evaluation and development of simple context-adapted LLMs
Upcoming Integration with CausalIQ Ecosystem
- 🔍 CausalIQ Discovery makes use of this package to learn more accurate graphs.
- 🧪 CausalIQ Analysis uses this package to explain the learning process, intelligently combine and explain results.
- 🔮 CausalIQ Predict uses this package to explain predictions made by learnt models.
Documentation
- User Guide - Getting started
- Architecture Overview - Design and components
- LLM Integration Design - Detailed LLM design
- Roadmap - Release planning
Supported Python Versions: 3.9, 3.10, 3.11, 3.12, 3.13
Default Python Version: 3.11
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file causaliq_knowledge-0.6.0.tar.gz.
File metadata
- Download URL: causaliq_knowledge-0.6.0.tar.gz
- Upload date:
- Size: 67.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
537c09a8ee1ac428d6dd2339ad5cbc4f629c8f34336a055d1556b172c3638157
|
|
| MD5 |
04c12ad4ae415ac7740725f53720fbed
|
|
| BLAKE2b-256 |
37a8ab65c4262e00d3c6c7ef473a5d4cce0fb7b4262ad8a34a98ee4205c938c1
|
File details
Details for the file causaliq_knowledge-0.6.0-py3-none-any.whl.
File metadata
- Download URL: causaliq_knowledge-0.6.0-py3-none-any.whl
- Upload date:
- Size: 82.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2124b41f0f45d2fde87e3ecaed4b673cbe9539e15c0c2f221ac9486644d765d1
|
|
| MD5 |
d0fb47c0a21abdb435f5cc08367f091b
|
|
| BLAKE2b-256 |
4f68c60df2a29a5d329442abcd2eb12bf2f59bd07c5d2fc849e1e414fba34df7
|