Content-addressed SPICE kernel database for deduplication and metakernel rewriting across missions
Project description
spice-kernel-db
Content-addressed SPICE kernel database for deduplication and metakernel rewriting across missions.
The problem
NAIF SPICE kernel archives for different missions often ship identical copies of generic kernels (naif0012.tls, pck00011.tpc, de432s.bsp, etc.). If you work with multiple missions, you end up storing the same files many times. And when you want to use a metakernel (.tm) from one mission, it expects all kernels to live under a specific directory tree — even if you already have the files downloaded for another mission.
What this tool does
-
Deduplication: Identifies identical kernel files across missions using SHA-256 hashing. Same content = same hash, regardless of filename (handles cases like
jup365.bspvsjup365_19900101_20500101.bsp). -
Metakernel rewriting: Rewrites
.tmfiles for local use with minimal edits — onlyPATH_VALUESis changed, everything else (header comments,KERNELS_TO_LOADentries) stays identical to the original. A symlink tree bridges the gap between where the metakernel expects files and where they actually live on disk. -
Mission-aware resolution: When resolving kernel paths, prefers copies from the same mission. Falls back to other missions with a clear warning.
Documentation
Full documentation (motivation, design, usage guide, API reference) is in the docs/ directory and built with Quarto:
cd docs
quarto preview
Installation
pip install spice-kernel-db
Or from source:
git clone https://github.com/michaelaye/spice-kernel-db
cd spice-kernel-db
pip install -e ".[dev]"
Quick start
Python API
from spice_kernel_db import KernelDB
db = KernelDB("~/.spice_kernels.duckdb")
# Register your local kernel trees
db.scan_directory("/data/spice/generic_kernels", mission="generic")
db.scan_directory("/data/spice/JUICE/kernels")
db.scan_directory("/data/spice/MRO/kernels")
# See what's duplicated
db.report_duplicates()
# Check a metakernel: what do you already have, what's missing?
result = db.check_metakernel(
"juice_crema_5_1_150lb_23_1_a3.tm",
mission="JUICE",
)
# result["found"], result["missing"], result["warnings"]
# Rewrite metakernel for local use (creates symlink tree,
# only changes PATH_VALUES in the .tm file)
db.rewrite_metakernel(
"juice_crema_5_1_150lb_23_1_a3.tm",
output="juice_local.tm",
mission="JUICE",
link_root="/data/spice/unified_kernels",
)
# Replace duplicate files with symlinks to save disk space
db.deduplicate_with_symlinks(dry_run=True) # preview first
db.deduplicate_with_symlinks(dry_run=False) # do it
CLI
# Scan directories
spice-kernel-db scan /data/spice/generic_kernels --mission generic
spice-kernel-db scan /data/spice/JUICE/kernels
spice-kernel-db scan /data/spice/MRO/kernels
# View stats and duplicates
spice-kernel-db stats
spice-kernel-db duplicates
# Check metakernel availability
spice-kernel-db check juice_crema_5_1.tm --mission JUICE
# Rewrite metakernel (creates symlink tree + rewritten .tm)
spice-kernel-db rewrite juice_crema_5_1.tm -o juice_local.tm
# Resolve a single kernel filename
spice-kernel-db resolve naif0012.tls --mission JUICE
# Deduplicate (dry run by default)
spice-kernel-db dedup
spice-kernel-db dedup --execute
How it works
Content-addressed storage
Every kernel file is identified by its SHA-256 hash. The DuckDB database has two core tables:
kernels:(sha256, filename, kernel_type, size_bytes)— one row per unique file contentlocations:(sha256, abs_path, mission, source_url)— one row per place a file exists on disk
When jup365.bsp (from generic_kernels) and jup365_19900101_20500101.bsp (from JUICE) have identical content, they share the same sha256 in kernels but have separate entries in locations.
Mission-aware resolution
resolve_kernel("naif0012.tls", preferred_mission="JUICE") follows this priority:
- Exact filename match in JUICE locations → ✅ no warning
- Exact filename match in any other mission → ⚠️ warning
- Fuzzy match (same hash, different name) in JUICE → ⚠️ warning
- Fuzzy match in any other mission → ⚠️ warning
- Not found → returns
None
Minimal metakernel edits
The rewrite_metakernel() command:
- Parses the original
.tmfile - For each
$KERNELS/type/filenameentry, resolves the filename to a local path - Creates a symlink at
link_root/type/filename→ actual file location - Writes a new
.tmwhere onlyPATH_VALUESis changed to point tolink_root
The KERNELS_TO_LOAD list, PATH_SYMBOLS, and all comments/header text remain identical to the original.
Database location
By default the database lives at ~/.spice_kernels.duckdb. Override with --db on the CLI or the db_path constructor argument.
Dependencies
- Python ≥ 3.10
- DuckDB ≥ 1.0
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spice_kernel_db-0.4.0.tar.gz.
File metadata
- Download URL: spice_kernel_db-0.4.0.tar.gz
- Upload date:
- Size: 50.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f14ae44477272b34dbc6ecd30d03ef665ff91cf6b9b99f9241bfbe41d52e82f
|
|
| MD5 |
7ed5ab71d3f41a348325943595d8dca7
|
|
| BLAKE2b-256 |
12686b1bffc3208895da1a72af31d99a6c125781a9d7cb8e042f50bbb4de4e4e
|
File details
Details for the file spice_kernel_db-0.4.0-py3-none-any.whl.
File metadata
- Download URL: spice_kernel_db-0.4.0-py3-none-any.whl
- Upload date:
- Size: 25.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67db6c911fc3adae1cf8fb12726ea928ae198b31701a1f3879b65d453c3b8053
|
|
| MD5 |
601a538d01e4cdac23828d3b1fb708bd
|
|
| BLAKE2b-256 |
f7fcab404e178d8e45eb6c77ca330bac3d2526022bdcb49c0177a3a3b6babf2d
|