GPU-accelerated seismic attribute computation via fused CUDA kernels
Project description
SeismicFlowCUDA
GPU-accelerated seismic attribute computation for NVIDIA CUDA hardware.
SeismicFlowCUDA implements seismic attributes as fused CUDA kernels compiled at runtime via CuPy's RawModule API — delivering throughput structurally inaccessible to CPU implementations, and establishing CUDA programming as the new standard for high-performance computing in exploration seismology.
Why SeismicFlowCUDA?
SeismicFlowCore demonstrated that compiled, parallelised CPU code outperforms pure-Python implementations by up to 190×. SeismicFlowCUDA goes further: by moving computation to the GPU, it eliminates the memory bandwidth and thread-count ceilings that constrain even the best CPU implementations.
- Fused CUDA kernels — gradient computation and coherence accumulation in a single pass; shared-memory tiling eliminates redundant global memory round-trips
- Shared memory tiling with bank-conflict elimination — 16×16 thread blocks, coalesced loads on the crossline (fastest) dimension, +1 crossline pad on the shared-memory tile
- Constant memory — Gaussian derivative kernel uploaded once per compilation, resident in L1 for every thread
- Module caching — kernels compiled once per
(device_id, kernel_size)pair and reused across every subsequent call in a session - Upper-triangle GLCM — stores only
N*(N+1)/2entries instead ofN², halving shared memory, doubling occupancy, and halving atomic contention - Warp-level reductions —
__shfl_down_syncreplaces shared-memory reduction trees for energy and entropy, eliminating synchronisation overhead - NumPy-native API — input and output are float32 NumPy arrays; GPU transfer is handled internally
Requirements
Hardware
An NVIDIA GPU with CUDA Compute Capability 3.5 or higher. SeismicFlowCUDA was developed and validated on an NVIDIA GeForce GTX 1080 Ti (Compute Capability 6.1).
Software
| Package | Tested version |
|---|---|
| Python | 3.12 (64-bit) |
| CuPy | 13.4.1 |
| NumPy | ≥ 1.26.4 |
| CUDA Toolkit | matching your CuPy build |
Installation
Step 1 — Install CUDA Toolkit
Download from: https://developer.nvidia.com/cuda-downloads
The CUDA Toolkit version must match the CuPy build you install in the next step. Check compatibility at: https://docs.cupy.dev/en/stable/install.html
Verify after install:
nvcc --version
Step 2 — Install CuPy
Install the CuPy wheel that matches your CUDA Toolkit version:
# CUDA 12.x
pip install cupy-cuda12x
# CUDA 11.x
pip install cupy-cuda11x
Verify:
import cupy
print(cupy.__version__) # should print e.g. 13.4.1
If CuPy imports but CUDA kernels fail to compile, your CUDA Toolkit and CuPy wheel versions are mismatched.
Step 3 — Install SeismicFlowCUDA
git clone https://github.com/SeismicFlow/SeismicFlowCUDA
cd SeismicFlowCUDA
pip install -e .
No compilation step is required. Kernels are compiled at first call via CuPy's RawModule API and cached for the duration of the session.
Step 4 — Verify your install
import numpy as np
import seismicflowcuda as sfc
data = np.random.randn(100, 50, 50).astype(np.float32)
print("coherence:", sfc.coherence(data).shape)
print("glcm: ", sfc.glcm(data, metric='homogeneity').shape)
print()
print("OK — SeismicFlowCUDA loaded successfully")
If both lines print a shape, your install is complete. On first call, CuPy will compile and cache the kernels — this takes a few seconds and does not repeat.
Quick Start
import numpy as np
import seismicflowcuda as sfc
# Seismic volume: (time, inline, crossline), float32
data = np.random.randn(500, 200, 200).astype(np.float32)
coh = sfc.coherence(data)
hom = sfc.glcm(data, metric='homogeneity')
ent = sfc.glcm(data, metric='entropy')
All functions accept (time, inline, crossline) float32 arrays and return float32 arrays of the same shape.
Attributes
Discontinuity and Structural Continuity
| Function | Description |
|---|---|
sfc.coherence(data) |
GST coherence — fused shared-memory kernel, Cardano eigenvalue decomposition |
Texture
| Function | Description |
|---|---|
sfc.glcm(data, metric='homogeneity') |
GLCM-based texture attribute — upper-triangle shared-memory GLCM, warp-level reduction |
API Reference
sfc.coherence
sfc.coherence(data, window=(3, 3, 3), sigma=1.0)
GPU GST coherence. Numerically equivalent to the SeismicFlowCore reference implementation.
| Parameter | Type | Default | Description |
|---|---|---|---|
data |
float32 ndarray (n_time, n_inline, n_crossline) |
— | Input seismic volume |
window |
tuple of 3 odd ints | (3, 3, 3) |
Centred analysis window |
sigma |
float | 1.0 |
Gaussian derivative sigma |
Returns float32 ndarray, same shape as data, values in [0, 1].
sfc.glcm
sfc.glcm(data, metric='homogeneity', half_levels=16, lateral_radius=4,
vertical_radius=1, clip_sigmas=1.5, co_offset=1)
GLCM-based seismic texture attribute.
| Parameter | Type | Default | Description |
|---|---|---|---|
data |
float32 ndarray (nt, ni, nx) |
— | Input seismic volume |
metric |
str | 'homogeneity' |
'contrast' | 'dissimilarity' | 'homogeneity' | 'energy' | 'entropy' |
half_levels |
int in [1, 16] |
16 |
Quantisation half-range |
lateral_radius |
int | 4 |
Spatial window half-width |
vertical_radius |
int | 1 |
Temporal window half-width |
clip_sigmas |
float > 0 | 1.5 |
RMS clipping scale |
co_offset |
int ≥ 1 | 1 |
Co-occurrence offset distance |
Returns float32 ndarray, same shape as data.
CUDA Kernel Design
GST Coherence — gst_coherence_fused
The reference SeismicFlowCore implementation uses four separate kernel launches: three calls to compute_gradient (one per spatial axis) followed by compute_coherence. Each launch reads the full volume from global memory, producing six global memory round-trips in total.
SeismicFlowCUDA replaces this with a single fused kernel:
- A shared-memory tile of shape
(ts0, ts1, ts2_pad)is loaded cooperatively by all threads in the block, with reflect-padded boundary handling - All three directional gradients are computed from the shared tile; no additional global memory reads are required
- The GST tensor is accumulated inline and eigenvalues are solved analytically via Cardano's method
__constant__memory holds the Gaussian derivative kernel, uploaded once per(device_id, kernel_size)pair- The crossline tile dimension is padded by +1 to eliminate shared memory bank conflicts on time-axis taps
Result: one global memory read pass instead of four, with all intermediate gradient values consumed from L1-resident shared memory.
GLCM — glcm_direct and glcm_ee
Two kernels serve the five GLCM metrics:
glcm_direct (contrast, dissimilarity, homogeneity) — one thread per output pixel. Accumulates co-occurrence weighted metrics directly without storing a GLCM matrix. Separate passes over the amplitude and Hilbert transform volumes; results summed at the output.
glcm_ee (energy, entropy) — one warp per output pixel. The full N×N GLCM is symmetric, so only the upper triangle (N*(N+1)/2 entries) is stored in shared memory per warp slot:
- For
N = 33levels: full matrix = 1,089 floats × 4 bytes = 4,356 bytes/warp; upper triangle = 561 floats × 4 bytes = 2,244 bytes/warp - At 4 warps per block: 8,976 bytes/block → 5 blocks per SM → 20 active warps/SM
- Diagonal entries represent
c[i,i]; off-diagonal entries storec[i,j] = c[j,i]once, with the factor-of-two applied at reduction time - Amplitude and Hilbert transform are processed in two sequential passes that reuse the same GLCM slot, avoiding a doubled shared-memory footprint
- Warp-level reductions over the triangle use
__shfl_down_syncwithFULL_MASK
Performance
Benchmarked on an NVIDIA GeForce GTX 1080 Ti against SeismicFlowCore (Intel Core i7-7700K, 8 threads, AVX2) on a 500×500×500 float32 volume. CPU times are wall-clock from SeismicFlowCore; GPU times include host-to-device transfer, kernel execution, and device-to-host transfer.
| Attribute | CPU (ms) | GPU (ms) | Speedup |
|---|---|---|---|
| GST Coherence | 31,225.5 | 681.4 | 45.8× |
| Contrast | 334,339.9 | 2,361.8 | 141.6× |
| Dissimilarity | 331,176.2 | 2,377.7 | 139.3× |
| Homogeneity | 341,142.8 | 2,398.7 | 142.2× |
| Energy | 306,925.9 | 6,084.5 | 50.4× |
| Entropy | 495,425.9 | 6,094.4 | 81.3× |
The GLCM contrast, dissimilarity, and homogeneity attributes achieve the largest speedups (up to 142×) because the fused glcm_direct kernel eliminates the intermediate GLCM matrix entirely — a data structure the CPU implementation must write and read back from memory for every output pixel. Energy and entropy (50–81×) involve the upper-triangle shared-memory GLCM and warp-level reductions, which carry slightly more per-pixel work but remain heavily GPU-bound. GST coherence (45.8×) benefits primarily from the fused single-pass kernel and shared-memory tiling, which reduce global memory traffic from six round-trips to one.
These results are reported in full in the accompanying manuscript submitted to Geophysics (SEG).
Relationship to SeismicFlowCore
SeismicFlowCUDA and SeismicFlowCore share the same public API contract: both accept (time, inline, crossline) float32 NumPy arrays and return float32 NumPy arrays. The GPU coherence implementation is byte-for-byte equivalent to the SeismicFlowCore reference on all tested volumes.
SeismicFlowCore remains the recommended choice for environments without NVIDIA hardware, or when a full 27-attribute pipeline is required. SeismicFlowCUDA is the recommended choice when a GPU is available and maximum throughput is the priority.
Citation
If you use SeismicFlowCUDA in your research, please cite:
Anonymous. (2026). SeismicFlowCUDA: Establishing CUDA Programming as the New Standard for High-Performance Computing in Exploration Seismology. Manuscript submitted to Geophysics (Society of Exploration Geophysicists).
If you also use SeismicFlowCore:
Anonymous. (2026). SeismicFlowCore: A High-Performance, Industry-Validated Open-Source Python Library for 3D Seismic Attribute Computation. Manuscript submitted for publication.
License
GNU General Public License v3.0 — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seismicflowcuda-0.0.1.tar.gz.
File metadata
- Download URL: seismicflowcuda-0.0.1.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6447d185b02668e8735c02f619cef24340da1bbb8ca48fe91714a36f155617d
|
|
| MD5 |
902728571c66cf2499923c67d5f47a51
|
|
| BLAKE2b-256 |
b4d065e8ddadea6689a36ecfa9912b42b78e2bb638188c9c8b73edeca6fdffbf
|
File details
Details for the file seismicflowcuda-0.0.1-py3-none-any.whl.
File metadata
- Download URL: seismicflowcuda-0.0.1-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b28cc7dfed882f19552fabf182b210eabc40a9426eb269f9f9a4664c07b1347f
|
|
| MD5 |
9a4a16c12fef270f874c90cfb023a033
|
|
| BLAKE2b-256 |
c71e2dc43cd1dbb454bb80e7cf9424e34c0d784fcf1b929b5caf0976aa719ed7
|