Skip to main content

SeismoAI Module 1 and Module 2: SGY loading and seismic visualization

Project description

SeismoAI: Modules 1 and 2

Python Package Tests Dataset Status


Module Assignment

This project implements:

  • Module 1 – seismoai_io: loading and preparing SEG-Y seismic data
  • Module 2 – seismoai_viz: visualizing seismic gathers, traces, and frequency spectra

These modules form the data ingestion and visualization layer of the SeismoAI pipeline and provide the foundation for later quality control, modeling, and explainability tasks.


Project Objective

The objective of this project is to design and implement a clean, reusable, and well-tested Python package for working with real seismic SEG-Y (.sgy) data.

As required in the assignment, the workflow followed was:

  1. Understand the dataset
  2. Inspect headers and trace structure
  3. Analyze amplitude distribution and data characteristics
  4. Implement robust I/O functions
  5. Develop visualization tools
  6. Validate the implementation on the complete dataset

The provided dataset contains:

  • 166 SEG-Y files
  • 167 traces per file
  • 4001 samples per trace
  • 1 ms sampling interval

Architecture Overview

flowchart LR
    A[SEG-Y Files on Disk] --> B[Module 1: seismoai_io]
    B --> B1[load_sgy]
    B --> B2[load_folder]
    B --> B3[normalize_traces]
    B --> B4[validate_dataset]

    B --> C[Module 2: seismoai_viz]
    C --> C1[plot_gather]
    C --> C2[plot_trace]
    C --> C3[plot_spectrum]

    B --> D[Exploration Script]
    B --> E[Full Dataset Analysis]
    D --> F[Trace Statistics]
    D --> G[Header Inspection]
    E --> H[CSV Summaries]
    E --> I[Diagnostic Plots]

    C --> J[Readable Seismic Visuals]
    H --> K[analysis_outputs]
    I --> K

Processing View

SEG-Y files
   ↓
load_sgy / load_folder
   ↓
trace + header extraction
   ↓
dataset understanding and validation
   ↓
normalization (if needed)
   ↓
visualization:
  - gather image
  - waveform
  - frequency spectrum

Project Structure

seismoai_final/
│
├── README.md
├── pyproject.toml
├── requirements.txt
├── .gitignore
│
├── data/
│   └── (dataset kept locally, not uploaded to GitHub)
│
├── analysis_outputs/
│   ├── dataset_summary.csv
│   ├── header_comparison.csv
│   ├── max_abs_per_file.png
│   ├── std_per_file.png
│   └── weak_trace_count_per_file.png
│
├── scripts/
│   ├── seismoai_explore.py
│   └── analyze_all_sgy_files.py
│
├── src/
│   └── seismoai_m1m2/
│       ├── __init__.py
│       ├── seismoai_io.py
│       └── seismoai_viz.py
│
├── tests/
│   ├── test_seismoai_io.py
│   └── test_seismoai_viz.py
│
└── docs/
    └── reflection.txt

Implemented Functions

Module 1 – seismoai_io

load_sgy(file_path)

Loads a single SEG-Y file and returns:

  • seismic traces as a numpy.ndarray
  • extracted trace headers as a pandas.DataFrame

load_folder(folder_path)

Loads all .sgy files from a directory and returns a list of structured results containing:

  • file name
  • file path
  • trace data
  • header data

normalize_traces(traces, method="maxabs")

Normalizes seismic amplitudes using:

  • max-absolute normalization
  • z-score normalization

validate_dataset(folder_path)

Checks the complete dataset and reports, for each file:

  • loading status
  • number of traces
  • number of samples
  • shape
  • header structure
  • summary statistics

Additional utilities

The module also includes helper functions for:

  • trace summary statistics
  • header summary extraction
  • per-trace diagnostics
  • near-dead trace detection

Module 2 – seismoai_viz

plot_gather(traces)

Displays the seismic gather as a 2D image using a seismic colormap.

plot_trace(traces, trace_index)

Plots the waveform of an individual trace.

plot_spectrum(trace, sample_interval_ms=1.0)

Computes and displays the frequency spectrum of a selected trace using FFT.


Dataset Understanding

A full dataset-wide analysis was performed on all 166 SEG-Y files.

Validation Results

  • All 166 files loaded successfully
  • All files have shape (167, 4001)
  • All files contain 89 header columns
  • No NaN values were detected
  • No Inf values were detected
  • Statistical behavior is highly consistent across files

Key Observations

  • The amplitude distribution is highly skewed
  • Strong outliers (around 758 amplitude) are present
  • Many traces contain very low energy
  • A large number of traces are classified as near-dead under simple low-variance thresholds
  • Header structure is consistent across the full dataset

Interpretation

These findings suggest that:

  • the dataset is structurally reliable
  • normalization is necessary for robust downstream use
  • direct raw-range plotting is not ideal because large outliers dominate visualization
  • percentile-based clipping significantly improves seismic gather readability

Sample Output Visuals

If these files are kept in the repository, GitHub will display them directly:

Maximum Absolute Amplitude per File

Maximum Absolute Amplitude

Standard Deviation per File

Standard Deviation per File

Weak Trace Count per File

Weak Trace Count


Installation

Install dependencies and register the package locally:

pip install -r requirements.txt
pip install -e .

Usage

Load a single SEG-Y file

from seismoai_m1m2.seismoai_io import load_sgy

traces, headers = load_sgy("path/to/file.sgy")
print(traces.shape)
print(headers.head())

Load a folder of SEG-Y files

from seismoai_m1m2.seismoai_io import load_folder

files = load_folder("path/to/data_folder")
print(len(files))

Normalize traces

from seismoai_m1m2.seismoai_io import normalize_traces

normalized_traces = normalize_traces(traces, method="maxabs")

Plot the seismic gather

from seismoai_m1m2.seismoai_viz import plot_gather

plot_gather(traces, clip_mode="percentile")

Plot one trace and its spectrum

from seismoai_m1m2.seismoai_viz import plot_trace, plot_spectrum

plot_trace(traces, trace_index=2)
plot_spectrum(traces[2], sample_interval_ms=1.0)

Running the Scripts

Explore one representative file

python scripts/seismoai_explore.py

This script:

  • loads one sample file
  • prints trace summary statistics
  • prints header summary
  • computes trace diagnostics
  • reports threshold sensitivity for near-dead traces
  • validates the full dataset
  • visualizes the gather, waveform, and spectrum

Analyze the complete dataset

python scripts/analyze_all_sgy_files.py

This script:

  • checks all 166 files
  • generates a CSV summary
  • compares header structure
  • creates dataset-level plots
  • saves all outputs in analysis_outputs/

Testing

Run all tests from the project root:

pytest -q

Current result

  • 7 tests passed successfully

The tests cover:

  • single-file loading
  • folder loading
  • normalization
  • full dataset validation
  • gather plotting
  • trace plotting
  • spectrum plotting

Why This Submission Meets the Assignment Requirements

  • All implemented functions include clear and well-structured Python docstrings as required.
  • working functions for Module 1 and Module 2
  • real SGY data support
  • docstrings for core functions
  • tests for all required functions
  • dataset understanding before model-related steps
  • validation across the full set of 166 SEG-Y files
  • professional project structure suitable for GitHub and packaging

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seismoai_m1m2-0.1.0.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seismoai_m1m2-0.1.0-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file seismoai_m1m2-0.1.0.tar.gz.

File metadata

  • Download URL: seismoai_m1m2-0.1.0.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for seismoai_m1m2-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c1324642d8cc1cb1194ee33944da85d401d2acf1db564b69a00d6e836c019470
MD5 13edece3f9dd5a56a9bf04915aea337c
BLAKE2b-256 a9f941b181281a5572fdac4d10494e7a6ff3aae1993b6662337a4f1ea9f5a8b9

See more details on using hashes here.

File details

Details for the file seismoai_m1m2-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: seismoai_m1m2-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for seismoai_m1m2-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 208f89c2464ad7fc3a3f6b5ed207a4725546ab26b6b5a3b9192049e29475a6a9
MD5 14d9ae1c17d2a06c8d62e9ea4ce92a4c
BLAKE2b-256 5f2310a988c4a5e2bed5839cd1cfa77693e0e330f8538d9c7e4416dac4c4c5f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page