Skip to main content

Binary Knowledge Graph Format with Embedded Inference for AI Applications

Project description

dotcausal

The Knowledge Graph Format for AI

PyPI version License: MIT

The .causal format is a binary knowledge graph format with embedded deterministic inference. It solves the fundamental problem of AI-assisted discovery: LLMs hallucinate, databases don't reason.

Why .causal?

The Problem

Technology What it does What's missing
SQLite Stores facts No reasoning - only returns explicit matches
Vector RAG Finds similar text No logic - returns relevance, not causality
LLMs Reasons creatively Hallucination risk - invents plausible but false connections

Example: If Paper A says "COVID → damages mitochondria" and Paper B says "mitochondrial damage → fatigue", a SQL query for "COVID → fatigue" returns nothing. The connection exists but is invisible.

The Solution

.causal pre-computes all transitive chains at storage time:

COVID → damages → mitochondria  (explicit, Paper A)
mitochondria → causes → fatigue  (explicit, Paper B)
─────────────────────────────────────────────────────
COVID → indirectly causes → fatigue  (INFERRED, deterministic)

Zero hallucination. Every inference has full provenance back to source papers.

Key Features

Feature Benefit
~30-40x faster queries 1.1ms vs 41.5ms (SQLite) - pre-computed inference
50-200% fact amplification Weak signals become visible through transitive chains
~60-80% smaller files MessagePack + entity deduplication
Zero hallucination Pure deterministic logic, full provenance
Edge AI ready Small enough for mobile/offline (air-gapped privacy)
Auto-threshold Self-adapting fuzzy matching based on entity characteristics

Installation

pip install dotcausal

Quick Start

Python API

from dotcausal import CausalWriter, CausalReader

# Create a knowledge graph
writer = CausalWriter()
writer.add_triplet(
    trigger="SARS-CoV-2",
    mechanism="damages",
    outcome="mitochondria",
    confidence=0.9,
    source="paper_A.pdf"
)
writer.add_triplet(
    trigger="mitochondrial dysfunction",
    mechanism="causes",
    outcome="chronic fatigue",
    confidence=0.85,
    source="paper_B.pdf"
)
writer.save("knowledge.causal")

# Query with inference amplification
reader = CausalReader("knowledge.causal")
stats = reader.get_stats()
print(f"Explicit: {stats['explicit_triplets']}")
print(f"Inferred: {stats['inferred_triplets']}")
print(f"Amplification: {stats['amplification_percent']}%")

# Search
results = reader.search("fatigue")
for r in results:
    tag = "[INFERRED]" if r['is_inferred'] else "[EXPLICIT]"
    print(f"{tag} {r['trigger']}{r['mechanism']}{r['outcome']}")

Command Line

# Show statistics
dotcausal stats knowledge.causal

# Query the graph
dotcausal query knowledge.causal "COVID" --limit 10

# Convert SQLite to .causal
dotcausal convert pipeline.db output.causal

# Export to JSON
dotcausal export knowledge.causal -o output.json

# Validate integrity
dotcausal validate knowledge.causal

The 3-Pass Inference Engine

Pass Method What it finds
1 Exact keyword A→activates→B + B→activates→C = A→activates→C
2 Semantic direction positive×negative = negative chain
3 Jaro-Winkler fuzzy "COVID-19" ↔ "SARS-CoV-2" (auto-threshold)

Auto-threshold calibration (v0.2.0+): The fuzzy matching threshold automatically adapts based on entity characteristics:

  • Short medical terms → strict (0.88)
  • Long scientific phrases → loose (0.72)

Use Cases

LLM Grounding (GraphRAG)

# Instead of asking an LLM to find connections (hallucination risk),
# query the deterministic graph and feed results to the LLM
chains = reader.search("drug_X", field="trigger")
# LLM now synthesizes based on verified facts, not guessing

Edge AI / Privacy

The format is compact enough (~3-5MB for thousands of papers) to run entirely on-device. No cloud, no data leakage. Perfect for:

  • Personal health knowledge graphs
  • Offline scientific assistants
  • Air-gapped research environments

Hypothesis Discovery

Weak signals (3 mentions) become visible convergence points (21+ mentions) after inference. This revealed 3 new Long COVID hypothesis candidates that were invisible in SQLite.

File Format

┌─────────────────────────────────────┐
│ HEADER (64 bytes)                   │
│ Magic: "CAUSAL01" | Version | CRC   │
├─────────────────────────────────────┤
│ ENTITIES - Deduplicated dictionary  │
├─────────────────────────────────────┤
│ TRIPLETS - Explicit facts + metadata│
├─────────────────────────────────────┤
│ RULES - Inference rules             │
├─────────────────────────────────────┤
│ CLUSTERS - Semantic groupings       │
├─────────────────────────────────────┤
│ GAPS - Identified knowledge gaps    │
└─────────────────────────────────────┘
  • Encoding: MessagePack (binary) with JSON fallback
  • Integrity: xxhash64 CRC verification
  • Compression: ~4.7:1 vs JSON through entity deduplication

Citation

If you use .causal in your research, please cite:

@article{foss2026causal,
  author = {Foss, David Tom},
  title = {The .causal Format: Deterministic Inference for AI-Assisted Hypothesis Amplification},
  journal = {Zenodo},
  year = {2026},
  doi = {10.5281/zenodo.18326222}
}

Links

License

MIT License - see LICENSE for details.


"The era of probabilistic guessing is ending; the era of deterministic discovery has begun."

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dotcausal-0.2.1.tar.gz (22.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dotcausal-0.2.1-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file dotcausal-0.2.1.tar.gz.

File metadata

  • Download URL: dotcausal-0.2.1.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for dotcausal-0.2.1.tar.gz
Algorithm Hash digest
SHA256 84bd7070a90fe97afb3de8b21246d2ff7c0bd65dcddf7b50ee8bbb2f5bf7b851
MD5 cb7236ff4d5a0cd874e272d7a3ceee9e
BLAKE2b-256 2647983dba02daa92d80b93a6393f90a995417a6b6b86c26921c3f6362166cb5

See more details on using hashes here.

File details

Details for the file dotcausal-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: dotcausal-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 23.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for dotcausal-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2f773fae43e89f2ff3dd7090db4d973af500609d6077f34d2ecbb6f2d0d734e9
MD5 4c2301dbc241699a7617d97b560edb11
BLAKE2b-256 f0cecc20bca6949b39cdc50ea3fc18fa5d15f44e69956c08985f2bc02419c3fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page