A static analysis tool for preventing temporal causality leaks in seismology ML.
Project description
seismic-linter
Stop publishing 99% accurate models that fail in production.
seismic-linter automatically detects temporal causality violations in earthquake forecasting and seismology machine learning pipelines. It catches the silent bugs that make your model "cheat" by using future data during training—leading to papers with impressive results that completely fail during real-time deployment.
The Problem
Earthquake forecasting suffers from a unique ML pathology: temporal data leakage. When you normalize magnitudes using global statistics, split data with shuffle=True, or fit transformers before temporal splitting, your model implicitly "knows" about future earthquakes. This creates artificially high accuracy that evaporates in production.
The Solution
seismic-linter provides:
- 🔍 Static analysis - Scan your Python code for leakage patterns before running
- ⚡ Runtime validation - Decorators (
@verify_monotonicity) and integrity checks - 🧪 Pytest Integration - Use
validate_split_integrity(train_df, test_df)after splitting. See docs/api.md for full API. - 📋 Pre-commit hooks - Block leaky code from entering your repository
The GitHub Action runs in a Linux container; Windows runners are not supported.
Detected Rules
| Rule ID | Description | Severity |
|---|---|---|
| T001 | Global statistics (mean/std) computed without temporal context | ⚠️ Warning |
| T002 | Model .fit() called on potentially leaky data (e.g., raw df) |
ℹ️ Info |
| T003 | train_test_split with shuffle=True (random split) |
❌ Error |
Configuration
Configuration is loaded from the pyproject.toml of the first path specified in the CLI arguments (or current directory if none).
Inline suppressions are supported using # seismic-linter: ignore rule_id (applies to current line only):
df['norm'] = (df['mag'] - df['mag'].mean()) / df['mag'].std() # seismic-linter: ignore T001
Note: When using
githuboutput format, paths are relative to the current working directory where possible.
Quick Example
# ❌ This will trigger a warning
df['normalized'] = (df['magnitude'] - df['magnitude'].mean()) / df['magnitude'].std()
# ✅ This passes validation
df['normalized'] = df.groupby('station')['magnitude'].transform(
lambda x: (x - x.rolling(window=100).mean())
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file seismic_linter-0.2.0.tar.gz.
File metadata
- Download URL: seismic_linter-0.2.0.tar.gz
- Upload date:
- Size: 26.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7fb4887efe75e14eebf8558a4ee8e2c9bf23907fdbdfa358492365daff0cfab7
|
|
| MD5 |
ae067d27c5129b5bd98188c8310d2a39
|
|
| BLAKE2b-256 |
0fcbd0d399e8dbc24c734127bd5ff28232aac540ac8e66099c7841dd48085133
|