GPU-accelerated seismic attribute computation via fused CUDA kernels

These details have not been verified by PyPI

Project links

Homepage

Project description

SeismicFlowCUDA

GPU-accelerated seismic attribute computation for NVIDIA CUDA hardware.

SeismicFlowCUDA implements seismic attributes as fused CUDA kernels compiled at runtime via CuPy's RawModule API — delivering throughput structurally inaccessible to CPU implementations, and establishing CUDA programming as the new standard for high-performance computing in exploration seismology.

Why SeismicFlowCUDA?

SeismicFlowCore demonstrated that compiled, parallelised CPU code outperforms pure-Python implementations by up to 190×. SeismicFlowCUDA goes further: by moving computation to the GPU, it eliminates the memory bandwidth and thread-count ceilings that constrain even the best CPU implementations.

Fused CUDA kernels — gradient computation and coherence accumulation in a single pass; shared-memory tiling eliminates redundant global memory round-trips
Shared memory tiling with bank-conflict elimination — 16×16 thread blocks, coalesced loads on the crossline (fastest) dimension, +1 crossline pad on the shared-memory tile
Constant memory — Gaussian derivative kernel uploaded once per compilation, resident in L1 for every thread
Module caching — kernels compiled once per (device_id, kernel_size) pair and reused across every subsequent call in a session
Upper-triangle GLCM — stores only N*(N+1)/2 entries instead of N², halving shared memory, doubling occupancy, and halving atomic contention
Warp-level reductions — __shfl_down_sync replaces shared-memory reduction trees for energy and entropy, eliminating synchronisation overhead
NumPy-native API — input and output are float32 NumPy arrays; GPU transfer is handled internally

Requirements

Hardware

An NVIDIA GPU with CUDA Compute Capability 3.5 or higher. SeismicFlowCUDA was developed and validated on an NVIDIA GeForce GTX 1080 Ti (Compute Capability 6.1).

Software

Package	Tested version
Python	3.12 (64-bit)
CuPy	13.4.1
NumPy	≥ 1.26.4
CUDA Toolkit	matching your CuPy build

Installation

Step 1 — Install CUDA Toolkit

Download from: https://developer.nvidia.com/cuda-downloads

The CUDA Toolkit version must match the CuPy build you install in the next step. Check compatibility at: https://docs.cupy.dev/en/stable/install.html

Verify after install:

nvcc --version

Step 2 — Install CuPy

Install the CuPy wheel that matches your CUDA Toolkit version:

# CUDA 12.x
pip install cupy-cuda12x

# CUDA 11.x
pip install cupy-cuda11x

Verify:

import cupy
print(cupy.__version__)   # should print e.g. 13.4.1

If CuPy imports but CUDA kernels fail to compile, your CUDA Toolkit and CuPy wheel versions are mismatched.

Step 3 — Install SeismicFlowCUDA

git clone https://github.com/SeismicFlow/SeismicFlowCUDA
cd SeismicFlowCUDA
pip install -e .

No compilation step is required. Kernels are compiled at first call via CuPy's RawModule API and cached for the duration of the session.

Step 4 — Verify your install

import numpy as np
import seismicflowcuda as sfc

data = np.random.randn(100, 50, 50).astype(np.float32)

print("coherence:", sfc.coherence(data).shape)
print("glcm:     ", sfc.glcm(data, metric='homogeneity').shape)
print()
print("OK — SeismicFlowCUDA loaded successfully")

If both lines print a shape, your install is complete. On first call, CuPy will compile and cache the kernels — this takes a few seconds and does not repeat.

Quick Start

import numpy as np
import seismicflowcuda as sfc

# Seismic volume: (time, inline, crossline), float32
data = np.random.randn(500, 200, 200).astype(np.float32)

coh = sfc.coherence(data)
hom = sfc.glcm(data, metric='homogeneity')
ent = sfc.glcm(data, metric='entropy')

All functions accept (time, inline, crossline) float32 arrays and return float32 arrays of the same shape.

Attributes

Discontinuity and Structural Continuity

Function	Description
`sfc.coherence(data)`	GST coherence — fused shared-memory kernel, Cardano eigenvalue decomposition

Texture

Function	Description
`sfc.glcm(data, metric='homogeneity')`	GLCM-based texture attribute — upper-triangle shared-memory GLCM, warp-level reduction

API Reference

`sfc.coherence`

sfc.coherence(data, window=(3, 3, 3), sigma=1.0)

GPU GST coherence. Numerically equivalent to the SeismicFlowCore reference implementation.

Parameter	Type	Default	Description
`data`	float32 ndarray `(n_time, n_inline, n_crossline)`	—	Input seismic volume
`window`	tuple of 3 odd ints	`(3, 3, 3)`	Centred analysis window
`sigma`	float	`1.0`	Gaussian derivative sigma

Returns float32 ndarray, same shape as data, values in [0, 1].

`sfc.glcm`

sfc.glcm(data, metric='homogeneity', half_levels=16, lateral_radius=4,
         vertical_radius=1, clip_sigmas=1.5, co_offset=1)

GLCM-based seismic texture attribute.

Parameter	Type	Default	Description
`data`	float32 ndarray `(nt, ni, nx)`	—	Input seismic volume
`metric`	str	`'homogeneity'`	`'contrast'` \| `'dissimilarity'` \| `'homogeneity'` \| `'energy'` \| `'entropy'`
`half_levels`	int in `[1, 16]`	`16`	Quantisation half-range
`lateral_radius`	int	`4`	Spatial window half-width
`vertical_radius`	int	`1`	Temporal window half-width
`clip_sigmas`	float > 0	`1.5`	RMS clipping scale
`co_offset`	int ≥ 1	`1`	Co-occurrence offset distance

Returns float32 ndarray, same shape as data.

CUDA Kernel Design

GST Coherence — `gst_coherence_fused`

The reference SeismicFlowCore implementation uses four separate kernel launches: three calls to compute_gradient (one per spatial axis) followed by compute_coherence. Each launch reads the full volume from global memory, producing six global memory round-trips in total.

SeismicFlowCUDA replaces this with a single fused kernel:

A shared-memory tile of shape (ts0, ts1, ts2_pad) is loaded cooperatively by all threads in the block, with reflect-padded boundary handling
All three directional gradients are computed from the shared tile; no additional global memory reads are required
The GST tensor is accumulated inline and eigenvalues are solved analytically via Cardano's method
__constant__ memory holds the Gaussian derivative kernel, uploaded once per (device_id, kernel_size) pair
The crossline tile dimension is padded by +1 to eliminate shared memory bank conflicts on time-axis taps

Result: one global memory read pass instead of four, with all intermediate gradient values consumed from L1-resident shared memory.

GLCM — `glcm_direct` and `glcm_ee`

Two kernels serve the five GLCM metrics:

glcm_direct (contrast, dissimilarity, homogeneity) — one thread per output pixel. Accumulates co-occurrence weighted metrics directly without storing a GLCM matrix. Separate passes over the amplitude and Hilbert transform volumes; results summed at the output.

glcm_ee (energy, entropy) — one warp per output pixel. The full N×N GLCM is symmetric, so only the upper triangle (N*(N+1)/2 entries) is stored in shared memory per warp slot:

For N = 33 levels: full matrix = 1,089 floats × 4 bytes = 4,356 bytes/warp; upper triangle = 561 floats × 4 bytes = 2,244 bytes/warp
At 4 warps per block: 8,976 bytes/block → 5 blocks per SM → 20 active warps/SM
Diagonal entries represent c[i,i]; off-diagonal entries store c[i,j] = c[j,i] once, with the factor-of-two applied at reduction time
Amplitude and Hilbert transform are processed in two sequential passes that reuse the same GLCM slot, avoiding a doubled shared-memory footprint
Warp-level reductions over the triangle use __shfl_down_sync with FULL_MASK

Performance

Benchmarked on an NVIDIA GeForce GTX 1080 Ti against SeismicFlowCore (Intel Core i7-7700K, 8 threads, AVX2) on a 500×500×500 float32 volume. CPU times are wall-clock from SeismicFlowCore; GPU times include host-to-device transfer, kernel execution, and device-to-host transfer.

Attribute	CPU (ms)	GPU (ms)	Speedup
GST Coherence	31,225.5	681.4	45.8×
Contrast	334,339.9	2,361.8	141.6×
Dissimilarity	331,176.2	2,377.7	139.3×
Homogeneity	341,142.8	2,398.7	142.2×
Energy	306,925.9	6,084.5	50.4×
Entropy	495,425.9	6,094.4	81.3×

The GLCM contrast, dissimilarity, and homogeneity attributes achieve the largest speedups (up to 142×) because the fused glcm_direct kernel eliminates the intermediate GLCM matrix entirely — a data structure the CPU implementation must write and read back from memory for every output pixel. Energy and entropy (50–81×) involve the upper-triangle shared-memory GLCM and warp-level reductions, which carry slightly more per-pixel work but remain heavily GPU-bound. GST coherence (45.8×) benefits primarily from the fused single-pass kernel and shared-memory tiling, which reduce global memory traffic from six round-trips to one.

These results are reported in full in the accompanying manuscript submitted to Geophysics (SEG).

Relationship to SeismicFlowCore

SeismicFlowCUDA and SeismicFlowCore share the same public API contract: both accept (time, inline, crossline) float32 NumPy arrays and return float32 NumPy arrays. The GPU coherence implementation is byte-for-byte equivalent to the SeismicFlowCore reference on all tested volumes.

SeismicFlowCore remains the recommended choice for environments without NVIDIA hardware, or when a full 27-attribute pipeline is required. SeismicFlowCUDA is the recommended choice when a GPU is available and maximum throughput is the priority.

Citation

If you use SeismicFlowCUDA in your research, please cite:

Anonymous. (2026). SeismicFlowCUDA: Establishing CUDA Programming as the New Standard for High-Performance Computing in Exploration Seismology. Manuscript submitted to Geophysics (Society of Exploration Geophysicists).

If you also use SeismicFlowCore:

Anonymous. (2026). SeismicFlowCore: A High-Performance, Industry-Validated Open-Source Python Library for 3D Seismic Attribute Computation. Manuscript submitted for publication.

License

GNU General Public License v3.0 — see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.1

May 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seismicflowcuda-0.0.1.tar.gz (6.5 kB view details)

Uploaded May 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

seismicflowcuda-0.0.1-py3-none-any.whl (5.6 kB view details)

Uploaded May 26, 2026 Python 3

File details

Details for the file seismicflowcuda-0.0.1.tar.gz.

File metadata

Download URL: seismicflowcuda-0.0.1.tar.gz
Upload date: May 26, 2026
Size: 6.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for seismicflowcuda-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`d6447d185b02668e8735c02f619cef24340da1bbb8ca48fe91714a36f155617d`
MD5	`902728571c66cf2499923c67d5f47a51`
BLAKE2b-256	`b4d065e8ddadea6689a36ecfa9912b42b78e2bb638188c9c8b73edeca6fdffbf`

See more details on using hashes here.

File details

Details for the file seismicflowcuda-0.0.1-py3-none-any.whl.

File metadata

Download URL: seismicflowcuda-0.0.1-py3-none-any.whl
Upload date: May 26, 2026
Size: 5.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for seismicflowcuda-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b28cc7dfed882f19552fabf182b210eabc40a9426eb269f9f9a4664c07b1347f`
MD5	`9a4a16c12fef270f874c90cfb023a033`
BLAKE2b-256	`c71e2dc43cd1dbb454bb80e7cf9424e34c0d784fcf1b929b5caf0976aa719ed7`

See more details on using hashes here.

SeismicFlowCUDA 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SeismicFlowCUDA

Why SeismicFlowCUDA?

Requirements

Hardware

Software

Installation

Step 1 — Install CUDA Toolkit

Step 2 — Install CuPy

Step 3 — Install SeismicFlowCUDA

Step 4 — Verify your install

Quick Start

Attributes

Discontinuity and Structural Continuity

Texture

API Reference

sfc.coherence

sfc.glcm

CUDA Kernel Design

GST Coherence — gst_coherence_fused

GLCM — glcm_direct and glcm_ee

Performance

Relationship to SeismicFlowCore

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`sfc.coherence`

`sfc.glcm`

GST Coherence — `gst_coherence_fused`

GLCM — `glcm_direct` and `glcm_ee`