Shared snapshot infrastructure for KG modules — zero stdlib-only deps.
Project description
kg-snapshot — Shared Snapshot Infrastructure for the KGRAG Framework
Author: Eric G. Suchanek, PhD
Flux-Frontiers, Liberty TWP, OH
Overview
kg-snapshot is a zero-dependency, stdlib-only package providing the canonical snapshot
infrastructure shared across all KGRAG domain knowledge graph packages — code-kg, doc-kg,
diary-kg, ftree-kg, metabo-kg, and others.
It was extracted from kg-rag to break a structural circular dependency: domain KG packages
need to subclass SnapshotManager to capture domain-specific metrics, but they cannot depend
on kg-rag if kg-rag itself depends on them.
By depending only on the Python standard library, kg-snapshot can sit at the base of the
entire KGRAG dependency tree with no conflicts.
What's Inside
| Class | Purpose |
|---|---|
Snapshot |
Point-in-time metrics dataclass — keyed by git tree hash, holds free-form metrics dict plus vs_previous / vs_baseline deltas |
SnapshotManifest |
JSON manifest index — tracks all snapshots with fast lookup by key |
SnapshotManager |
Capture, persist, retrieve, compare, and diff snapshots — subclass to add domain-specific delta fields |
All three are importable directly from kg_snapshot:
from kg_snapshot import Snapshot, SnapshotManifest, SnapshotManager
Design
Free-form metrics
Snapshot.metrics is a plain dict so each domain stores whatever fields it needs
without touching shared code:
# code-kg stores node/edge counts by kind
metrics = {"total_nodes": 342, "total_edges": 5711, "node_counts": {"function": 70, ...}}
# doc-kg stores coverage and chunk info
metrics = {"total_nodes": 800, "coverage_score": 0.91, "chunk_count": 640}
# metabo-kg stores pathway and kinetic parameter counts
metrics = {"total_nodes": 500, "pathway_count": 50, "kinetic_params": 1200}
The only required keys are total_nodes and total_edges — used for universal delta computation.
Subclass for domain deltas
Override _compute_delta_from_metrics to add domain-specific delta fields:
from kg_snapshot import SnapshotManager
class MyKGSnapshotManager(SnapshotManager):
def _compute_delta_from_metrics(self, new_m, old_m):
base = super()._compute_delta_from_metrics(new_m, old_m)
base["coverage_delta"] = new_m.get("coverage", 0) - old_m.get("coverage", 0)
return base
Dedup — no-op snapshot suppression
save_snapshot() compares the incoming snapshot's version and metrics against
the latest manifest entry. If nothing changed, the existing entry is refreshed
in-place (tree hash, timestamp, and branch updated; old JSON file replaced) rather
than growing history with a no-op snapshot.
Override _metrics_changed to define your own threshold:
class ThresholdManager(SnapshotManager):
def _metrics_changed(self, new: dict, old: dict) -> bool:
# Only record if node count shifts by more than 5
return abs(new.get("total_nodes", 0) - old.get("total_nodes", 0)) > 5
Pass force=True to bypass dedup and always write a new history entry:
mgr.save_snapshot(snapshot, force=True)
Git helpers included
SnapshotManager provides _get_current_tree_hash() and _get_current_branch() as
@staticmethod methods so subclasses inherit them for free — no duplication across repos.
Quick Start
Install
# From PyPI (once published)
pip install kg-snapshot
# From source (editable, for local development)
pip install -e /path/to/kg_snapshot
Or in a Poetry project's pyproject.toml:
[tool.poetry.dependencies]
kg-snapshot = {path = "../kg_snapshot", develop = true}
Capture and save a snapshot
from kg_snapshot import SnapshotManager
mgr = SnapshotManager(".mykg/snapshots", package_name="my-kg")
# Capture — graph_stats_dict from your KG's stats() method
# Any additional kwargs are merged into the metrics dict
snapshot = mgr.capture(
version="1.0.0",
graph_stats_dict={"total_nodes": 500, "total_edges": 800},
coverage=0.87,
)
mgr.save_snapshot(snapshot)
Query snapshots
# Load specific or latest
snap = mgr.load_snapshot("latest")
print(snap.metrics["total_nodes"])
print(snap.vs_previous) # delta from previous snapshot (backfilled on load)
# List in reverse chronological order
for entry in mgr.list_snapshots(limit=10):
print(entry["timestamp"], entry["metrics"]["total_nodes"])
# Diff two snapshots
diff = mgr.diff_snapshots(key_a, key_b)
print(diff["delta"])
Dependency Graph
kg-snapshot (zero deps — stdlib only)
▲
├── kg-rag (re-exports for backwards compat)
├── code-kg (CodeKGSnapshotManager subclass)
├── doc-kg (DocKGSnapshotManager subclass)
├── diary-kg (DiarySnapshotManager subclass)
├── ftree-kg (FtreeSnapshotManager subclass)
└── metabo-kg (SnapshotManager subclass)
kg-rag re-exports Snapshot, SnapshotManifest, and SnapshotManager from kg_snapshot
via a thin compatibility shim — all existing from kg_rag.snapshots import ... call-sites
continue to work unchanged.
Requirements
- Python ≥ 3.12, < 3.14
- No third-party dependencies (stdlib only:
dataclasses,json,pathlib,subprocess,datetime,importlib.metadata)
Development
git clone https://github.com/Flux-Frontiers/kg_snapshot.git
cd kg_snapshot
poetry install
poetry run pytest tests/ -v
Installing the KG-aware pre-commit hook
The standard pre-commit install stub is replaced by a wrapper that rebuilds
CodeKG and DocKG indices, saves snapshots, and then runs the pre-commit
framework checks — mirroring the hook used in code-kg and doc-kg:
bash scripts/install-hooks.sh
Re-run after any pre-commit install that overwrites the stub. Skip the KG
rebuild for a quick fixup commit with:
CODEKG_SKIP_SNAPSHOT=1 git commit ...
Running the full KGRAG test suite
The scripts/run_tests.sh script runs all snapshot-related tests across every domain package
in dependency order:
bash scripts/run_tests.sh
| Phase | What it does |
|---|---|
| 1 | kg_snapshot base tests — 22 tests, no domain deps required |
| 2 | Domain subclass tests in each repo's own venv |
| 3 | Import chain smoke-test per repo |
| 4 | Load real on-disk snapshots from built KG instances |
Project Structure
kg_snapshot/
├── README.md
├── SNAPSHOTS.md # Full extraction handoff notes
├── pyproject.toml
├── src/
│ └── kg_snapshot/
│ ├── __init__.py # Public API: Snapshot, SnapshotManifest, SnapshotManager
│ └── snapshots.py # Full implementation (stdlib only)
├── tests/
│ └── test_snapshot_base.py # 22 tests — round-trip, deltas, dedup, manifest, git helpers
└── scripts/
├── run_tests.sh # Full KGRAG-wide snapshot test runner
├── install-hooks.sh # Installs KG-aware pre-commit hook
└── pre-commit-hook # Versioned hook: rebuild CodeKG+DocKG, snapshot, pre-commit
Related Projects
- KGRAG — Unified orchestration layer (re-exports kg-snapshot for compatibility)
- CodeKG — Structural knowledge graph for Python codebases
- DocKG — Semantic knowledge graph for document corpora
- MetaKG — Metabolic pathway knowledge graph
License
Elastic License 2.0 — see LICENSE.
Free to use, modify, and distribute. You may not offer the software as a hosted or managed service to third parties. Commercial use internally is permitted.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kg_snapshot-0.3.0.tar.gz.
File metadata
- Download URL: kg_snapshot-0.3.0.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.13 Darwin/25.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5588e756fe6378a40577cf32f6b4a7416658aba9dff397623d5613a86f44a73f
|
|
| MD5 |
71c05778a2efa3cab0eae27c7320ddcc
|
|
| BLAKE2b-256 |
15be6f28c22196a07566dc94031724875bb623f3e99aa30adae399eb6ca4f931
|
File details
Details for the file kg_snapshot-0.3.0-py3-none-any.whl.
File metadata
- Download URL: kg_snapshot-0.3.0-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.13 Darwin/25.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46f5cb57eb7ac66e3d4c64b46c2a403268c65e7ffa2bd45766a871b06d5bd69b
|
|
| MD5 |
c477872acb02d217e6eda1001095a0f1
|
|
| BLAKE2b-256 |
53b622a2c7aae76de449cb53e454ec37aabda8cbb996599c5746ac64077967a5
|