Skip to main content

Incorporating LLM and human knowledge into causal discovery

Project description

causaliq-knowledge

Python Versions License: MIT Coverage

The CausalIQ Knowledge project represents a novel approach to causal discovery by combining the traditional statistical structure learning algorithms with the contextual understanding and reasoning capabilities of Large Language Models. This integration enables more interpretable, domain-aware, and human-friendly causal discovery workflows. It is part of the CausalIQ ecosystem for intelligent causal discovery.

Status

🚧 Active Development - this repository is currently in active development, which involves:

  • Adding new knowledge features, in particular knowledge from LLMs
  • Migrating functionality which provides knowledge based on standard reference networks from the legacy monolithic discovery repo
  • Ensuring CausalIQ development standards are met

Quick Start

from causaliq_knowledge.llm import LLMKnowledge

# Query an LLM about a potential causal relationship
knowledge = LLMKnowledge(models=["groq/llama-3.1-8b-instant"])
result = knowledge.query_edge("smoking", "lung_cancer")

print(f"Exists: {result.exists}, Direction: {result.direction}")
print(f"Confidence: {result.confidence}")
print(f"Reasoning: {result.reasoning}")

Features

Currently implemented releases:

  • Release v0.1.0 - Foundation LLM: Simple LLM queries to 1 or 2 LLMs about edge existence and orientation to support graph averaging
  • Release v0.2.0 - Additional LLMs: Support for 7 LLM providers (Groq, Gemini, OpenAI, Anthropic, DeepSeek, Mistral, Ollama)
  • Release v0.3.0 - LLM Caching: SQLite-based response caching with CLI tools for cache management
  • Release v0.4.0 - Graph Generation: CLI and CausalIQ workflow action for LLM-generated causal graphs

Planned:

  • Release v0.5.0 - Graph Caching: save generated graphs to Workflow caches
  • Release v0.6.0 - LLM Cost Tracking: Query LLM provider APIs for usage and cost statistics
  • Release v0.7.0 - LLM Context: Variable/role/literature etc context
  • Release v0.8.0 - Algorithm integration: Integration into structure learning algorithms

Implementation Approach

Technology Stack

  • Vendor-Specific API Clients: Direct integration with LLM providers using httpx
  • Pydantic: Structured response validation
  • Click: Command-line interface

Why Vendor-Specific APIs (not LiteLLM/LangChain)?

We use direct vendor-specific API clients rather than wrapper libraries:

Aspect Direct APIs Wrapper Libraries
Reliability ✅ Full control ❌ Wrapper bugs
Dependencies ✅ Minimal (httpx) ❌ Heavy (~50-100MB)
Debugging ✅ Clear traces ❌ Abstraction layers
Maintenance ✅ We control ❌ Wait for updates

This approach keeps the package lightweight, reliable, and easy to debug.

Supported LLM Providers

Provider Client Models Free Tier
Groq GroqClient llama-3.1-8b-instant ✅ Generous
Google Gemini GeminiClient gemini-2.5-flash ✅ Generous
OpenAI OpenAIClient gpt-4o-mini ❌ Paid
Anthropic AnthropicClient claude-sonnet-4-20250514 ❌ Paid
DeepSeek DeepSeekClient deepseek-chat ✅ Low cost
Mistral MistralClient mistral-small-latest ❌ Paid
Ollama OllamaClient llama3 ✅ Free (local)

Upcoming Key Innovations

🧠 LLMs support Causal Discovery and Inference

  • Initially LLM will work with graph averaging to resolve uncertain edges (use entropy to decide edges with uncertain existence or direction)
  • Integration into structure learning algorithms to provide knowledge for "uncertain" areas of the graph
  • LLMs analyse learning process and errors to suggest improved algorithms
  • LLMs used to preprocess text and visual data so they can be used as inputs to structure learning

🤝 Human Engagement

  • Natural language constraints: Specify domain knowledge in plain English
  • Expert knowledge incorporation by converting expert understanding into algorithmic constraints
  • LLMs convert natural language questions to causal queries
  • Interactive causal discovery where structure learning or LLMs identify areas of causal uncertainty and can test causal hypotheses through dialogue

🪟 Transparency and interpretability

  • LLMs interpret structure learning process and outputs, including their uncertainties
  • LLMs interpret causal inference results including uncertainties
  • Contextual graph interpretation to explain variable meanings and relationships
  • Uncertainty communication with clear explanation of confidence levels and limitations
  • Report generation including automated research summaries and methodology descriptions

🔒 Stability and reproducibility

  • Cache queries and responses so that experiments are stable and repeatable even if LLMs themselves are not
  • Stable randomisation of e.g. data sub-sampling

💰 Efficient use of LLM resources (important as an independent researcher)

  • Cache queries and results so that knowledge can be re-used
  • Evaluation and development of simple context-adapted LLMs

Upcoming Integration with CausalIQ Ecosystem

  • 🔍 CausalIQ Discovery makes use of this package to learn more accurate graphs.
  • 🧪 CausalIQ Analysis uses this package to explain the learning process, intelligently combine and explain results.
  • 🔮 CausalIQ Predict uses this package to explain predictions made by learnt models.

Documentation


Supported Python Versions: 3.9, 3.10, 3.11, 3.12, 3.13
Default Python Version: 3.11

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causaliq_knowledge-0.4.0.tar.gz (71.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

causaliq_knowledge-0.4.0-py3-none-any.whl (91.5 kB view details)

Uploaded Python 3

File details

Details for the file causaliq_knowledge-0.4.0.tar.gz.

File metadata

  • Download URL: causaliq_knowledge-0.4.0.tar.gz
  • Upload date:
  • Size: 71.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for causaliq_knowledge-0.4.0.tar.gz
Algorithm Hash digest
SHA256 5ca66de0acd4338b929cccf5b9776ef0cf78d32e7e5634e112661a1664ff5b2c
MD5 e65cb198ecd28e4c46272bf383a51907
BLAKE2b-256 aca51b8be1ad4bfda4512cde1ba8f1f1b2059e5bd5f06106cb06c8e9c526ad83

See more details on using hashes here.

File details

Details for the file causaliq_knowledge-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for causaliq_knowledge-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e2ac9e43edaa2fe49472d9a1fb6da642ee0c5dd92f5cee7dd69b5004faab72aa
MD5 82365b6c39d726d37c88510f57be3907
BLAKE2b-256 3c9ec5b93169e3592f4676c335bb3086304139023ad4be83e21f302a79290cfa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page