Incorporating LLM and human knowledge into causal discovery

These details have not been verified by PyPI

Project links

Project description

causaliq-knowledge

Python Versions Coverage

The CausalIQ Knowledge project represents a novel approach to causal discovery by combining the traditional statistical structure learning algorithms with the contextual understanding and reasoning capabilities of Large Language Models. This integration enables more interpretable, domain-aware, and human-friendly causal discovery workflows. It is part of the CausalIQ ecosystem for intelligent causal discovery.

Status

🚧 Active Development - this repository is currently in active development, which involves:

Adding new knowledge features, in particular knowledge from LLMs
Migrating functionality which provides knowledge based on standard reference networks from the legacy monolithic discovery repo
Ensuring CausalIQ development standards are met

Quick Start

from causaliq_knowledge.llm import LLMKnowledge

# Query an LLM about a potential causal relationship
knowledge = LLMKnowledge(models=["groq/llama-3.1-8b-instant"])
result = knowledge.query_edge("smoking", "lung_cancer")

print(f"Exists: {result.exists}, Direction: {result.direction}")
print(f"Confidence: {result.confidence}")
print(f"Reasoning: {result.reasoning}")

Features

✅ Currently implemented releases:

v0.1.0 - Foundation LLM [January 2026]: Foundation release establishing LLM client infrastructure for causal graph generation.
v0.2.0 - Additional LLMs [January 2026]: Expanded LLM provider support from 2 to 7 providers.
v0.3.0 - LLM Caching [January 2026]: SQLite-based response caching with CLI tools for cache management.
v0.4.0 - Graph Generation [February 2026]: CLI tools and CausalIQ workflows for LLM-generated causal graphs.
v0.5.0 - Workflow Integration [February 2026]: Integration into CausalIQ Workflows including writing results to cache.

🛣️ Upcoming Releases (speculative)

Release v0.6.0 - Statistical Fusion: Support knowledge requirements for fusing LLM knowledge and statistical graph averaging.
Release v0.7.0 - LLM Provider Cost Tracking: Query LLM provider APIs for usage and cost statistics.
Release v0.8.0 - Enhanced LLM Context: Background literature supplied to LLMs
Release v0.9.0 - Legacy Reference: Support for deriving knowledge from reference networks and migration of functionality from legacy discovery repo

Implementation Approach

Technology Stack

Vendor-Specific API Clients: Direct integration with LLM providers using httpx
Pydantic: Structured response validation
Click: Command-line interface

Why Vendor-Specific APIs (not LiteLLM/LangChain)?

We use direct vendor-specific API clients rather than wrapper libraries:

Aspect	Direct APIs	Wrapper Libraries
Reliability	✅ Full control	❌ Wrapper bugs
Dependencies	✅ Minimal (httpx)	❌ Heavy (~50-100MB)
Debugging	✅ Clear traces	❌ Abstraction layers
Maintenance	✅ We control	❌ Wait for updates

This approach keeps the package lightweight, reliable, and easy to debug.

Supported LLM Providers

Provider	Client	Models	Free Tier
Groq	`GroqClient`	llama-3.1-8b-instant	✅ Generous
Google Gemini	`GeminiClient`	gemini-2.5-flash	✅ Generous
OpenAI	`OpenAIClient`	gpt-4o-mini	❌ Paid
Anthropic	`AnthropicClient`	claude-sonnet-4-20250514	❌ Paid
DeepSeek	`DeepSeekClient`	deepseek-chat	✅ Low cost
Mistral	`MistralClient`	mistral-small-latest	❌ Paid
Ollama	`OllamaClient`	llama3	✅ Free (local)

Upcoming Key Innovations

🧠 LLMs support Causal Discovery and Inference

Initially LLM will work with graph averaging to resolve uncertain edges (use entropy to decide edges with uncertain existence or direction)
Integration into structure learning algorithms to provide knowledge for "uncertain" areas of the graph
LLMs analyse learning process and errors to suggest improved algorithms
LLMs used to preprocess text and visual data so they can be used as inputs to structure learning

🤝 Human Engagement

Natural language constraints: Specify domain knowledge in plain English
Expert knowledge incorporation by converting expert understanding into algorithmic constraints
LLMs convert natural language questions to causal queries
Interactive causal discovery where structure learning or LLMs identify areas of causal uncertainty and can test causal hypotheses through dialogue

🪟 Transparency and interpretability

LLMs interpret structure learning process and outputs, including their uncertainties
LLMs interpret causal inference results including uncertainties
Contextual graph interpretation to explain variable meanings and relationships
Uncertainty communication with clear explanation of confidence levels and limitations
Report generation including automated research summaries and methodology descriptions

🔒 Stability and reproducibility

Cache queries and responses so that experiments are stable and repeatable even if LLMs themselves are not
Stable randomisation of e.g. data sub-sampling

💰 Efficient use of LLM resources (important as an independent researcher)

Cache queries and results so that knowledge can be re-used
Evaluation and development of simple context-adapted LLMs

Upcoming Integration with CausalIQ Ecosystem

🔍 CausalIQ Discovery makes use of this package to learn more accurate graphs.
🧪 CausalIQ Analysis uses this package to explain the learning process, intelligently combine and explain results.
🔮 CausalIQ Predict uses this package to explain predictions made by learnt models.

Documentation

User Guide - Getting started
Architecture Overview - Design and components
LLM Integration Design - Detailed LLM design
Roadmap - Release planning

Supported Python Versions: 3.9, 3.10, 3.11, 3.12, 3.13
Default Python Version: 3.11

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.0

Apr 10, 2026

This version

0.5.0

Feb 20, 2026

0.4.0

Feb 4, 2026

0.3.0

Jan 27, 2026

0.2.0

Jan 10, 2026

0.1.0

Jan 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

causaliq_knowledge-0.5.0.tar.gz (64.2 kB view details)

Uploaded Feb 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

causaliq_knowledge-0.5.0-py3-none-any.whl (79.5 kB view details)

Uploaded Feb 20, 2026 Python 3

File details

Details for the file causaliq_knowledge-0.5.0.tar.gz.

File metadata

Download URL: causaliq_knowledge-0.5.0.tar.gz
Upload date: Feb 20, 2026
Size: 64.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for causaliq_knowledge-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`f7f3be2c437625a41300596d0db39a9f58d546c6a30bb19f50a627c209b30316`
MD5	`64018ede95208b1814592c15b9fa2b69`
BLAKE2b-256	`6988dd9fc1f99204ad5b63b810c16a810e5a97a050318aa843d324f286a9428a`

See more details on using hashes here.

File details

Details for the file causaliq_knowledge-0.5.0-py3-none-any.whl.

File metadata

Download URL: causaliq_knowledge-0.5.0-py3-none-any.whl
Upload date: Feb 20, 2026
Size: 79.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for causaliq_knowledge-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c840a68da2e737c3ac41c506568ae086c1b17baf7cacbe807466324e11a142c6`
MD5	`bc9ba9778299a93a8af927d3579c4594`
BLAKE2b-256	`ebc74716c9716174ab54c20134facf2210dc43d75ef1f776875a728fb822b8d9`

See more details on using hashes here.

causaliq-knowledge 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

causaliq-knowledge

Status

Quick Start

Features

✅ Currently implemented releases:

🛣️ Upcoming Releases (speculative)

Implementation Approach

Technology Stack

Why Vendor-Specific APIs (not LiteLLM/LangChain)?

Supported LLM Providers

Upcoming Key Innovations

🧠 LLMs support Causal Discovery and Inference

🤝 Human Engagement

🪟 Transparency and interpretability

🔒 Stability and reproducibility

💰 Efficient use of LLM resources (important as an independent researcher)

Upcoming Integration with CausalIQ Ecosystem

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes