Skip to main content

Shared snapshot infrastructure for KG modules — zero stdlib-only deps.

Project description

Python License: Elastic-2.0 Version CI Poetry

kg-snapshot — Shared Snapshot Infrastructure for the KGRAG Framework

Author: Eric G. Suchanek, PhD

Flux-Frontiers, Liberty TWP, OH


Overview

kg-snapshot is a zero-dependency, stdlib-only package providing the canonical snapshot infrastructure shared across all KGRAG domain knowledge graph packages — code-kg, doc-kg, diary-kg, ftree-kg, metabo-kg, and others.

It was extracted from kg-rag to break a structural circular dependency: domain KG packages need to subclass SnapshotManager to capture domain-specific metrics, but they cannot depend on kg-rag if kg-rag itself depends on them.

By depending only on the Python standard library, kg-snapshot can sit at the base of the entire KGRAG dependency tree with no conflicts.


What's Inside

Class Purpose
Snapshot Point-in-time metrics dataclass — keyed by git tree hash, holds free-form metrics dict plus vs_previous / vs_baseline deltas
SnapshotManifest JSON manifest index — tracks all snapshots with fast lookup by key
SnapshotManager Capture, persist, retrieve, compare, and diff snapshots — subclass to add domain-specific delta fields

All three are importable directly from kg_snapshot:

from kg_snapshot import Snapshot, SnapshotManifest, SnapshotManager

Design

Free-form metrics

Snapshot.metrics is a plain dict so each domain stores whatever fields it needs without touching shared code:

# code-kg stores node/edge counts by kind
metrics = {"total_nodes": 342, "total_edges": 5711, "node_counts": {"function": 70, ...}}

# doc-kg stores coverage and chunk info
metrics = {"total_nodes": 800, "coverage_score": 0.91, "chunk_count": 640}

# metabo-kg stores pathway and kinetic parameter counts
metrics = {"total_nodes": 500, "pathway_count": 50, "kinetic_params": 1200}

The only required keys are total_nodes and total_edges — used for universal delta computation.

Subclass for domain deltas

Override _compute_delta_from_metrics to add domain-specific delta fields:

from kg_snapshot import SnapshotManager

class MyKGSnapshotManager(SnapshotManager):
    def _compute_delta_from_metrics(self, new_m, old_m):
        base = super()._compute_delta_from_metrics(new_m, old_m)
        base["coverage_delta"] = new_m.get("coverage", 0) - old_m.get("coverage", 0)
        return base

Dedup — no-op snapshot suppression

save_snapshot() compares the incoming snapshot's version and metrics against the latest manifest entry. If nothing changed, the existing entry is refreshed in-place (tree hash, timestamp, and branch updated; old JSON file replaced) rather than growing history with a no-op snapshot.

Override _metrics_changed to define your own threshold:

class ThresholdManager(SnapshotManager):
    def _metrics_changed(self, new: dict, old: dict) -> bool:
        # Only record if node count shifts by more than 5
        return abs(new.get("total_nodes", 0) - old.get("total_nodes", 0)) > 5

Pass force=True to bypass dedup and always write a new history entry:

mgr.save_snapshot(snapshot, force=True)

Git helpers included

SnapshotManager provides _get_current_tree_hash() and _get_current_branch() as @staticmethod methods so subclasses inherit them for free — no duplication across repos.


Quick Start

Install

# From PyPI (once published)
pip install kg-snapshot

# From source (editable, for local development)
pip install -e /path/to/kg_snapshot

Or in a Poetry project's pyproject.toml:

[tool.poetry.dependencies]
kg-snapshot = {path = "../kg_snapshot", develop = true}

Capture and save a snapshot

from kg_snapshot import SnapshotManager

mgr = SnapshotManager(".mykg/snapshots", package_name="my-kg")

# Capture — graph_stats_dict from your KG's stats() method
# Any additional kwargs are merged into the metrics dict
snapshot = mgr.capture(
    version="1.0.0",
    graph_stats_dict={"total_nodes": 500, "total_edges": 800},
    coverage=0.87,
)
mgr.save_snapshot(snapshot)

Query snapshots

# Load specific or latest
snap = mgr.load_snapshot("latest")
print(snap.metrics["total_nodes"])
print(snap.vs_previous)   # delta from previous snapshot (backfilled on load)

# List in reverse chronological order
for entry in mgr.list_snapshots(limit=10):
    print(entry["timestamp"], entry["metrics"]["total_nodes"])

# Diff two snapshots
diff = mgr.diff_snapshots(key_a, key_b)
print(diff["delta"])

Dependency Graph

kg-snapshot   (zero deps — stdlib only)
    ▲
    ├── kg-rag        (re-exports for backwards compat)
    ├── code-kg       (CodeKGSnapshotManager subclass)
    ├── doc-kg        (DocKGSnapshotManager subclass)
    ├── diary-kg      (DiarySnapshotManager subclass)
    ├── ftree-kg      (FtreeSnapshotManager subclass)
    └── metabo-kg     (SnapshotManager subclass)

kg-rag re-exports Snapshot, SnapshotManifest, and SnapshotManager from kg_snapshot via a thin compatibility shim — all existing from kg_rag.snapshots import ... call-sites continue to work unchanged.


Requirements

  • Python ≥ 3.12, < 3.14
  • No third-party dependencies (stdlib only: dataclasses, json, pathlib, subprocess, datetime, importlib.metadata)

Development

git clone https://github.com/Flux-Frontiers/kg_snapshot.git
cd kg_snapshot
poetry install
poetry run pytest tests/ -v

Installing the KG-aware pre-commit hook

The standard pre-commit install stub is replaced by a wrapper that rebuilds CodeKG and DocKG indices, saves snapshots, and then runs the pre-commit framework checks — mirroring the hook used in code-kg and doc-kg:

bash scripts/install-hooks.sh

Re-run after any pre-commit install that overwrites the stub. Skip the KG rebuild for a quick fixup commit with:

CODEKG_SKIP_SNAPSHOT=1 git commit ...

Running the full KGRAG test suite

The scripts/run_tests.sh script runs all snapshot-related tests across every domain package in dependency order:

bash scripts/run_tests.sh
Phase What it does
1 kg_snapshot base tests — 22 tests, no domain deps required
2 Domain subclass tests in each repo's own venv
3 Import chain smoke-test per repo
4 Load real on-disk snapshots from built KG instances

Project Structure

kg_snapshot/
├── README.md
├── SNAPSHOTS.md              # Full extraction handoff notes
├── pyproject.toml
├── src/
│   └── kg_snapshot/
│       ├── __init__.py       # Public API: Snapshot, SnapshotManifest, SnapshotManager
│       └── snapshots.py      # Full implementation (stdlib only)
├── tests/
│   └── test_snapshot_base.py # 22 tests — round-trip, deltas, dedup, manifest, git helpers
└── scripts/
    ├── run_tests.sh          # Full KGRAG-wide snapshot test runner
    ├── install-hooks.sh      # Installs KG-aware pre-commit hook
    └── pre-commit-hook       # Versioned hook: rebuild CodeKG+DocKG, snapshot, pre-commit

Related Projects

  • KGRAG — Unified orchestration layer (re-exports kg-snapshot for compatibility)
  • CodeKG — Structural knowledge graph for Python codebases
  • DocKG — Semantic knowledge graph for document corpora
  • MetaKG — Metabolic pathway knowledge graph

License

Elastic License 2.0 — see LICENSE.

Free to use, modify, and distribute. You may not offer the software as a hosted or managed service to third parties. Commercial use internally is permitted.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kg_snapshot-0.3.0.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kg_snapshot-0.3.0-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file kg_snapshot-0.3.0.tar.gz.

File metadata

  • Download URL: kg_snapshot-0.3.0.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.13 Darwin/25.4.0

File hashes

Hashes for kg_snapshot-0.3.0.tar.gz
Algorithm Hash digest
SHA256 5588e756fe6378a40577cf32f6b4a7416658aba9dff397623d5613a86f44a73f
MD5 71c05778a2efa3cab0eae27c7320ddcc
BLAKE2b-256 15be6f28c22196a07566dc94031724875bb623f3e99aa30adae399eb6ca4f931

See more details on using hashes here.

File details

Details for the file kg_snapshot-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: kg_snapshot-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.13 Darwin/25.4.0

File hashes

Hashes for kg_snapshot-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 46f5cb57eb7ac66e3d4c64b46c2a403268c65e7ffa2bd45766a871b06d5bd69b
MD5 c477872acb02d217e6eda1001095a0f1
BLAKE2b-256 53b622a2c7aae76de449cb53e454ec37aabda8cbb996599c5746ac64077967a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page