Binary Knowledge Graph Format with Embedded Inference for AI Applications
Project description
dotcausal
The Knowledge Graph Format for AI
The .causal format is a binary knowledge graph format with embedded deterministic inference. It solves the fundamental problem of AI-assisted discovery: LLMs hallucinate, databases don't reason.
Why .causal?
The Problem
| Technology | What it does | What's missing |
|---|---|---|
| SQLite | Stores facts | No reasoning - only returns explicit matches |
| Vector RAG | Finds similar text | No logic - returns relevance, not causality |
| LLMs | Reasons creatively | Hallucination risk - invents plausible but false connections |
Example: If Paper A says "COVID → damages mitochondria" and Paper B says "mitochondrial damage → fatigue", a SQL query for "COVID → fatigue" returns nothing. The connection exists but is invisible.
The Solution
.causal pre-computes all transitive chains at storage time:
COVID → damages → mitochondria (explicit, Paper A)
mitochondria → causes → fatigue (explicit, Paper B)
─────────────────────────────────────────────────────
COVID → indirectly causes → fatigue (INFERRED, deterministic)
Zero hallucination. Every inference has full provenance back to source papers.
Key Features
| Feature | Benefit |
|---|---|
| ~30-40x faster queries | 1.1ms vs 41.5ms (SQLite) - pre-computed inference |
| 50-200% fact amplification | Weak signals become visible through transitive chains |
| ~60-80% smaller files | MessagePack + entity deduplication |
| Zero hallucination | Pure deterministic logic, full provenance |
| Edge AI ready | Small enough for mobile/offline (air-gapped privacy) |
| Auto-threshold | Self-adapting fuzzy matching based on entity characteristics |
Installation
pip install dotcausal
Quick Start
Python API
from dotcausal import CausalWriter, CausalReader
# Create a knowledge graph
writer = CausalWriter()
writer.add_triplet(
trigger="SARS-CoV-2",
mechanism="damages",
outcome="mitochondria",
confidence=0.9,
source="paper_A.pdf"
)
writer.add_triplet(
trigger="mitochondrial dysfunction",
mechanism="causes",
outcome="chronic fatigue",
confidence=0.85,
source="paper_B.pdf"
)
writer.save("knowledge.causal")
# Query with inference amplification
reader = CausalReader("knowledge.causal")
stats = reader.get_stats()
print(f"Explicit: {stats['explicit_triplets']}")
print(f"Inferred: {stats['inferred_triplets']}")
print(f"Amplification: {stats['amplification_percent']}%")
# Search
results = reader.search("fatigue")
for r in results:
tag = "[INFERRED]" if r['is_inferred'] else "[EXPLICIT]"
print(f"{tag} {r['trigger']} → {r['mechanism']} → {r['outcome']}")
Command Line
# Show statistics
dotcausal stats knowledge.causal
# Query the graph
dotcausal query knowledge.causal "COVID" --limit 10
# Convert SQLite to .causal
dotcausal convert pipeline.db output.causal
# Export to JSON
dotcausal export knowledge.causal -o output.json
# Validate integrity
dotcausal validate knowledge.causal
The 3-Pass Inference Engine
| Pass | Method | What it finds |
|---|---|---|
| 1 | Exact keyword | A→activates→B + B→activates→C = A→activates→C |
| 2 | Semantic direction | positive×negative = negative chain |
| 3 | Jaro-Winkler fuzzy | "COVID-19" ↔ "SARS-CoV-2" (auto-threshold) |
Auto-threshold calibration (v0.2.0+): The fuzzy matching threshold automatically adapts based on entity characteristics:
- Short medical terms → strict (0.88)
- Long scientific phrases → loose (0.72)
Use Cases
LLM Grounding (GraphRAG)
# Instead of asking an LLM to find connections (hallucination risk),
# query the deterministic graph and feed results to the LLM
chains = reader.search("drug_X", field="trigger")
# LLM now synthesizes based on verified facts, not guessing
Edge AI / Privacy
The format is compact enough (~3-5MB for thousands of papers) to run entirely on-device. No cloud, no data leakage. Perfect for:
- Personal health knowledge graphs
- Offline scientific assistants
- Air-gapped research environments
Hypothesis Discovery
Weak signals (3 mentions) become visible convergence points (21+ mentions) after inference. This revealed 3 new Long COVID hypothesis candidates that were invisible in SQLite.
File Format
┌─────────────────────────────────────┐
│ HEADER (64 bytes) │
│ Magic: "CAUSAL01" | Version | CRC │
├─────────────────────────────────────┤
│ ENTITIES - Deduplicated dictionary │
├─────────────────────────────────────┤
│ TRIPLETS - Explicit facts + metadata│
├─────────────────────────────────────┤
│ RULES - Inference rules │
├─────────────────────────────────────┤
│ CLUSTERS - Semantic groupings │
├─────────────────────────────────────┤
│ GAPS - Identified knowledge gaps │
└─────────────────────────────────────┘
- Encoding: MessagePack (binary) with JSON fallback
- Integrity: xxhash64 CRC verification
- Compression: ~4.7:1 vs JSON through entity deduplication
Citation
If you use .causal in your research, please cite:
@article{foss2026causal,
author = {Foss, David Tom},
title = {The .causal Format: Deterministic Inference for AI-Assisted Hypothesis Amplification},
journal = {Zenodo},
year = {2026},
doi = {10.5281/zenodo.18326222}
}
Links
- Homepage: dotcausal.com
- Whitepaper: Zenodo DOI 10.5281/zenodo.18326222
- GitHub: github.com/DT-Foss/dotcausal
- PyPI: pypi.org/project/dotcausal
License
MIT License - see LICENSE for details.
"The era of probabilistic guessing is ending; the era of deterministic discovery has begun."
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dotcausal-0.2.1.tar.gz.
File metadata
- Download URL: dotcausal-0.2.1.tar.gz
- Upload date:
- Size: 22.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84bd7070a90fe97afb3de8b21246d2ff7c0bd65dcddf7b50ee8bbb2f5bf7b851
|
|
| MD5 |
cb7236ff4d5a0cd874e272d7a3ceee9e
|
|
| BLAKE2b-256 |
2647983dba02daa92d80b93a6393f90a995417a6b6b86c26921c3f6362166cb5
|
File details
Details for the file dotcausal-0.2.1-py3-none-any.whl.
File metadata
- Download URL: dotcausal-0.2.1-py3-none-any.whl
- Upload date:
- Size: 23.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f773fae43e89f2ff3dd7090db4d973af500609d6077f34d2ecbb6f2d0d734e9
|
|
| MD5 |
4c2301dbc241699a7617d97b560edb11
|
|
| BLAKE2b-256 |
f0cecc20bca6949b39cdc50ea3fc18fa5d15f44e69956c08985f2bc02419c3fd
|