Skip to main content

A static analysis tool for preventing temporal causality leaks in seismology ML.

Project description

seismic-linter

PyPI CI License: MIT

Stop publishing 99% accurate models that fail in production.

seismic-linter automatically detects temporal causality violations in earthquake forecasting and seismology machine learning pipelines. It catches the silent bugs that make your model "cheat" by using future data during training—leading to papers with impressive results that completely fail during real-time deployment.

The Problem

Earthquake forecasting suffers from a unique ML pathology: temporal data leakage. When you normalize magnitudes using global statistics, split data with shuffle=True, or fit transformers before temporal splitting, your model implicitly "knows" about future earthquakes. This creates artificially high accuracy that evaporates in production.

The Solution

seismic-linter provides:

  • 🔍 Static analysis - Scan your Python code for leakage patterns before running
  • Runtime validation - Decorators (@verify_monotonicity) and integrity checks
  • 🧪 Pytest Integration - Use validate_split_integrity(train_df, test_df) after splitting. See docs/api.md for full API.
  • 📋 Pre-commit hooks - Block leaky code from entering your repository

The GitHub Action runs in a Linux container; Windows runners are not supported.

Detected Rules

Rule ID Description Severity
T001 Global statistics (mean/std) computed without temporal context ⚠️ Warning
T002 Model .fit() called on potentially leaky data (e.g., raw df) ℹ️ Info
T003 train_test_split with shuffle=True (random split) ❌ Error

Configuration

Configuration is loaded from the pyproject.toml of the first path specified in the CLI arguments (or current directory if none).

Inline suppressions are supported using # seismic-linter: ignore rule_id (applies to current line only):

df['norm'] = (df['mag'] - df['mag'].mean()) / df['mag'].std()  # seismic-linter: ignore T001

Note: When using github output format, paths are relative to the current working directory where possible.

Quick Example

# ❌ This will trigger a warning
df['normalized'] = (df['magnitude'] - df['magnitude'].mean()) / df['magnitude'].std()

# ✅ This passes validation  
df['normalized'] = df.groupby('station')['magnitude'].transform(
    lambda x: (x - x.rolling(window=100).mean())
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seismic_linter-0.2.0.tar.gz (26.5 kB view details)

Uploaded Source

File details

Details for the file seismic_linter-0.2.0.tar.gz.

File metadata

  • Download URL: seismic_linter-0.2.0.tar.gz
  • Upload date:
  • Size: 26.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for seismic_linter-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7fb4887efe75e14eebf8558a4ee8e2c9bf23907fdbdfa358492365daff0cfab7
MD5 ae067d27c5129b5bd98188c8310d2a39
BLAKE2b-256 0fcbd0d399e8dbc24c734127bd5ff28232aac540ac8e66099c7841dd48085133

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page