Skip to main content

GPU-accelerated seismic attribute computation via fused CUDA kernels

Project description

SeismicFlowCUDA

GPU-accelerated seismic attribute computation for NVIDIA CUDA hardware.

SeismicFlowCUDA implements seismic attributes as fused CUDA kernels compiled at runtime via CuPy's RawModule API — delivering throughput structurally inaccessible to CPU implementations, and establishing CUDA programming as the new standard for high-performance computing in exploration seismology.


Why SeismicFlowCUDA?

SeismicFlowCore demonstrated that compiled, parallelised CPU code outperforms pure-Python implementations by up to 190×. SeismicFlowCUDA goes further: by moving computation to the GPU, it eliminates the memory bandwidth and thread-count ceilings that constrain even the best CPU implementations.

  • Fused CUDA kernels — gradient computation and coherence accumulation in a single pass; shared-memory tiling eliminates redundant global memory round-trips
  • Shared memory tiling with bank-conflict elimination — 16×16 thread blocks, coalesced loads on the crossline (fastest) dimension, +1 crossline pad on the shared-memory tile
  • Constant memory — Gaussian derivative kernel uploaded once per compilation, resident in L1 for every thread
  • Module caching — kernels compiled once per (device_id, kernel_size) pair and reused across every subsequent call in a session
  • Upper-triangle GLCM — stores only N*(N+1)/2 entries instead of , halving shared memory, doubling occupancy, and halving atomic contention
  • Warp-level reductions__shfl_down_sync replaces shared-memory reduction trees for energy and entropy, eliminating synchronisation overhead
  • NumPy-native API — input and output are float32 NumPy arrays; GPU transfer is handled internally

Requirements

Hardware

An NVIDIA GPU with CUDA Compute Capability 3.5 or higher. SeismicFlowCUDA was developed and validated on an NVIDIA GeForce GTX 1080 Ti (Compute Capability 6.1).

Software

Package Tested version
Python 3.12 (64-bit)
CuPy 13.4.1
NumPy ≥ 1.26.4
CUDA Toolkit matching your CuPy build

Installation

Step 1 — Install CUDA Toolkit

Download from: https://developer.nvidia.com/cuda-downloads

The CUDA Toolkit version must match the CuPy build you install in the next step. Check compatibility at: https://docs.cupy.dev/en/stable/install.html

Verify after install:

nvcc --version

Step 2 — Install CuPy

Install the CuPy wheel that matches your CUDA Toolkit version:

# CUDA 12.x
pip install cupy-cuda12x

# CUDA 11.x
pip install cupy-cuda11x

Verify:

import cupy
print(cupy.__version__)   # should print e.g. 13.4.1

If CuPy imports but CUDA kernels fail to compile, your CUDA Toolkit and CuPy wheel versions are mismatched.


Step 3 — Install SeismicFlowCUDA

git clone https://github.com/SeismicFlow/SeismicFlowCUDA
cd SeismicFlowCUDA
pip install -e .

No compilation step is required. Kernels are compiled at first call via CuPy's RawModule API and cached for the duration of the session.


Step 4 — Verify your install

import numpy as np
import seismicflowcuda as sfc

data = np.random.randn(100, 50, 50).astype(np.float32)

print("coherence:", sfc.coherence(data).shape)
print("glcm:     ", sfc.glcm(data, metric='homogeneity').shape)
print()
print("OK — SeismicFlowCUDA loaded successfully")

If both lines print a shape, your install is complete. On first call, CuPy will compile and cache the kernels — this takes a few seconds and does not repeat.


Quick Start

import numpy as np
import seismicflowcuda as sfc

# Seismic volume: (time, inline, crossline), float32
data = np.random.randn(500, 200, 200).astype(np.float32)

coh = sfc.coherence(data)
hom = sfc.glcm(data, metric='homogeneity')
ent = sfc.glcm(data, metric='entropy')

All functions accept (time, inline, crossline) float32 arrays and return float32 arrays of the same shape.


Attributes

Discontinuity and Structural Continuity

Function Description
sfc.coherence(data) GST coherence — fused shared-memory kernel, Cardano eigenvalue decomposition

Texture

Function Description
sfc.glcm(data, metric='homogeneity') GLCM-based texture attribute — upper-triangle shared-memory GLCM, warp-level reduction

API Reference

sfc.coherence

sfc.coherence(data, window=(3, 3, 3), sigma=1.0)

GPU GST coherence. Numerically equivalent to the SeismicFlowCore reference implementation.

Parameter Type Default Description
data float32 ndarray (n_time, n_inline, n_crossline) Input seismic volume
window tuple of 3 odd ints (3, 3, 3) Centred analysis window
sigma float 1.0 Gaussian derivative sigma

Returns float32 ndarray, same shape as data, values in [0, 1].


sfc.glcm

sfc.glcm(data, metric='homogeneity', half_levels=16, lateral_radius=4,
         vertical_radius=1, clip_sigmas=1.5, co_offset=1)

GLCM-based seismic texture attribute.

Parameter Type Default Description
data float32 ndarray (nt, ni, nx) Input seismic volume
metric str 'homogeneity' 'contrast' | 'dissimilarity' | 'homogeneity' | 'energy' | 'entropy'
half_levels int in [1, 16] 16 Quantisation half-range
lateral_radius int 4 Spatial window half-width
vertical_radius int 1 Temporal window half-width
clip_sigmas float > 0 1.5 RMS clipping scale
co_offset int ≥ 1 1 Co-occurrence offset distance

Returns float32 ndarray, same shape as data.


CUDA Kernel Design

GST Coherence — gst_coherence_fused

The reference SeismicFlowCore implementation uses four separate kernel launches: three calls to compute_gradient (one per spatial axis) followed by compute_coherence. Each launch reads the full volume from global memory, producing six global memory round-trips in total.

SeismicFlowCUDA replaces this with a single fused kernel:

  • A shared-memory tile of shape (ts0, ts1, ts2_pad) is loaded cooperatively by all threads in the block, with reflect-padded boundary handling
  • All three directional gradients are computed from the shared tile; no additional global memory reads are required
  • The GST tensor is accumulated inline and eigenvalues are solved analytically via Cardano's method
  • __constant__ memory holds the Gaussian derivative kernel, uploaded once per (device_id, kernel_size) pair
  • The crossline tile dimension is padded by +1 to eliminate shared memory bank conflicts on time-axis taps

Result: one global memory read pass instead of four, with all intermediate gradient values consumed from L1-resident shared memory.

GLCM — glcm_direct and glcm_ee

Two kernels serve the five GLCM metrics:

glcm_direct (contrast, dissimilarity, homogeneity) — one thread per output pixel. Accumulates co-occurrence weighted metrics directly without storing a GLCM matrix. Separate passes over the amplitude and Hilbert transform volumes; results summed at the output.

glcm_ee (energy, entropy) — one warp per output pixel. The full N×N GLCM is symmetric, so only the upper triangle (N*(N+1)/2 entries) is stored in shared memory per warp slot:

  • For N = 33 levels: full matrix = 1,089 floats × 4 bytes = 4,356 bytes/warp; upper triangle = 561 floats × 4 bytes = 2,244 bytes/warp
  • At 4 warps per block: 8,976 bytes/block → 5 blocks per SM → 20 active warps/SM
  • Diagonal entries represent c[i,i]; off-diagonal entries store c[i,j] = c[j,i] once, with the factor-of-two applied at reduction time
  • Amplitude and Hilbert transform are processed in two sequential passes that reuse the same GLCM slot, avoiding a doubled shared-memory footprint
  • Warp-level reductions over the triangle use __shfl_down_sync with FULL_MASK

Performance

Benchmarked on an NVIDIA GeForce GTX 1080 Ti against SeismicFlowCore (Intel Core i7-7700K, 8 threads, AVX2) on a 500×500×500 float32 volume. CPU times are wall-clock from SeismicFlowCore; GPU times include host-to-device transfer, kernel execution, and device-to-host transfer.

Attribute CPU (ms) GPU (ms) Speedup
GST Coherence 31,225.5 681.4 45.8×
Contrast 334,339.9 2,361.8 141.6×
Dissimilarity 331,176.2 2,377.7 139.3×
Homogeneity 341,142.8 2,398.7 142.2×
Energy 306,925.9 6,084.5 50.4×
Entropy 495,425.9 6,094.4 81.3×

The GLCM contrast, dissimilarity, and homogeneity attributes achieve the largest speedups (up to 142×) because the fused glcm_direct kernel eliminates the intermediate GLCM matrix entirely — a data structure the CPU implementation must write and read back from memory for every output pixel. Energy and entropy (50–81×) involve the upper-triangle shared-memory GLCM and warp-level reductions, which carry slightly more per-pixel work but remain heavily GPU-bound. GST coherence (45.8×) benefits primarily from the fused single-pass kernel and shared-memory tiling, which reduce global memory traffic from six round-trips to one.

These results are reported in full in the accompanying manuscript submitted to Geophysics (SEG).


Relationship to SeismicFlowCore

SeismicFlowCUDA and SeismicFlowCore share the same public API contract: both accept (time, inline, crossline) float32 NumPy arrays and return float32 NumPy arrays. The GPU coherence implementation is byte-for-byte equivalent to the SeismicFlowCore reference on all tested volumes.

SeismicFlowCore remains the recommended choice for environments without NVIDIA hardware, or when a full 27-attribute pipeline is required. SeismicFlowCUDA is the recommended choice when a GPU is available and maximum throughput is the priority.


Citation

If you use SeismicFlowCUDA in your research, please cite:

Anonymous. (2026). SeismicFlowCUDA: Establishing CUDA Programming as the New Standard for High-Performance Computing in Exploration Seismology. Manuscript submitted to Geophysics (Society of Exploration Geophysicists).

If you also use SeismicFlowCore:

Anonymous. (2026). SeismicFlowCore: A High-Performance, Industry-Validated Open-Source Python Library for 3D Seismic Attribute Computation. Manuscript submitted for publication.


License

GNU General Public License v3.0 — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seismicflowcuda-0.0.1.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seismicflowcuda-0.0.1-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file seismicflowcuda-0.0.1.tar.gz.

File metadata

  • Download URL: seismicflowcuda-0.0.1.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for seismicflowcuda-0.0.1.tar.gz
Algorithm Hash digest
SHA256 d6447d185b02668e8735c02f619cef24340da1bbb8ca48fe91714a36f155617d
MD5 902728571c66cf2499923c67d5f47a51
BLAKE2b-256 b4d065e8ddadea6689a36ecfa9912b42b78e2bb638188c9c8b73edeca6fdffbf

See more details on using hashes here.

File details

Details for the file seismicflowcuda-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for seismicflowcuda-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b28cc7dfed882f19552fabf182b210eabc40a9426eb269f9f9a4664c07b1347f
MD5 9a4a16c12fef270f874c90cfb023a033
BLAKE2b-256 c71e2dc43cd1dbb454bb80e7cf9424e34c0d784fcf1b929b5caf0976aa719ed7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page