SeismoAI Module 1 and Module 2: SGY loading and seismic visualization
Project description
SeismoAI: Modules 1 and 2
Module Assignment
This project implements:
- Module 1 –
seismoai_io: loading and preparing SEG-Y seismic data - Module 2 –
seismoai_viz: visualizing seismic gathers, traces, and frequency spectra
These modules form the data ingestion and visualization layer of the SeismoAI pipeline and provide the foundation for later quality control, modeling, and explainability tasks.
Project Objective
The objective of this project is to design and implement a clean, reusable, and well-tested Python package for working with real seismic SEG-Y (.sgy) data.
As required in the assignment, the workflow followed was:
- Understand the dataset
- Inspect headers and trace structure
- Analyze amplitude distribution and data characteristics
- Implement robust I/O functions
- Develop visualization tools
- Validate the implementation on the complete dataset
The provided dataset contains:
- 166 SEG-Y files
- 167 traces per file
- 4001 samples per trace
- 1 ms sampling interval
Architecture Overview
flowchart LR
A[SEG-Y Files on Disk] --> B[Module 1: seismoai_io]
B --> B1[load_sgy]
B --> B2[load_folder]
B --> B3[normalize_traces]
B --> B4[validate_dataset]
B --> C[Module 2: seismoai_viz]
C --> C1[plot_gather]
C --> C2[plot_trace]
C --> C3[plot_spectrum]
B --> D[Exploration Script]
B --> E[Full Dataset Analysis]
D --> F[Trace Statistics]
D --> G[Header Inspection]
E --> H[CSV Summaries]
E --> I[Diagnostic Plots]
C --> J[Readable Seismic Visuals]
H --> K[analysis_outputs]
I --> K
Processing View
SEG-Y files
↓
load_sgy / load_folder
↓
trace + header extraction
↓
dataset understanding and validation
↓
normalization (if needed)
↓
visualization:
- gather image
- waveform
- frequency spectrum
Project Structure
seismoai_final/
│
├── README.md
├── pyproject.toml
├── requirements.txt
├── .gitignore
│
├── data/
│ └── (dataset kept locally, not uploaded to GitHub)
│
├── analysis_outputs/
│ ├── dataset_summary.csv
│ ├── header_comparison.csv
│ ├── max_abs_per_file.png
│ ├── std_per_file.png
│ └── weak_trace_count_per_file.png
│
├── scripts/
│ ├── seismoai_explore.py
│ └── analyze_all_sgy_files.py
│
├── src/
│ └── seismoai_m1m2/
│ ├── __init__.py
│ ├── seismoai_io.py
│ └── seismoai_viz.py
│
├── tests/
│ ├── test_seismoai_io.py
│ └── test_seismoai_viz.py
│
└── docs/
└── reflection.txt
Implemented Functions
Module 1 – seismoai_io
load_sgy(file_path)
Loads a single SEG-Y file and returns:
- seismic traces as a
numpy.ndarray - extracted trace headers as a
pandas.DataFrame
load_folder(folder_path)
Loads all .sgy files from a directory and returns a list of structured results containing:
- file name
- file path
- trace data
- header data
normalize_traces(traces, method="maxabs")
Normalizes seismic amplitudes using:
- max-absolute normalization
- z-score normalization
validate_dataset(folder_path)
Checks the complete dataset and reports, for each file:
- loading status
- number of traces
- number of samples
- shape
- header structure
- summary statistics
Additional utilities
The module also includes helper functions for:
- trace summary statistics
- header summary extraction
- per-trace diagnostics
- near-dead trace detection
Module 2 – seismoai_viz
plot_gather(traces)
Displays the seismic gather as a 2D image using a seismic colormap.
plot_trace(traces, trace_index)
Plots the waveform of an individual trace.
plot_spectrum(trace, sample_interval_ms=1.0)
Computes and displays the frequency spectrum of a selected trace using FFT.
Dataset Understanding
A full dataset-wide analysis was performed on all 166 SEG-Y files.
Validation Results
- All 166 files loaded successfully
- All files have shape (167, 4001)
- All files contain 89 header columns
- No NaN values were detected
- No Inf values were detected
- Statistical behavior is highly consistent across files
Key Observations
- The amplitude distribution is highly skewed
- Strong outliers (around 758 amplitude) are present
- Many traces contain very low energy
- A large number of traces are classified as near-dead under simple low-variance thresholds
- Header structure is consistent across the full dataset
Interpretation
These findings suggest that:
- the dataset is structurally reliable
- normalization is necessary for robust downstream use
- direct raw-range plotting is not ideal because large outliers dominate visualization
- percentile-based clipping significantly improves seismic gather readability
Sample Output Visuals
If these files are kept in the repository, GitHub will display them directly:
Maximum Absolute Amplitude per File
Standard Deviation per File
Weak Trace Count per File
Installation
Install dependencies and register the package locally:
pip install -r requirements.txt
pip install -e .
Usage
Load a single SEG-Y file
from seismoai_m1m2.seismoai_io import load_sgy
traces, headers = load_sgy("path/to/file.sgy")
print(traces.shape)
print(headers.head())
Load a folder of SEG-Y files
from seismoai_m1m2.seismoai_io import load_folder
files = load_folder("path/to/data_folder")
print(len(files))
Normalize traces
from seismoai_m1m2.seismoai_io import normalize_traces
normalized_traces = normalize_traces(traces, method="maxabs")
Plot the seismic gather
from seismoai_m1m2.seismoai_viz import plot_gather
plot_gather(traces, clip_mode="percentile")
Plot one trace and its spectrum
from seismoai_m1m2.seismoai_viz import plot_trace, plot_spectrum
plot_trace(traces, trace_index=2)
plot_spectrum(traces[2], sample_interval_ms=1.0)
Running the Scripts
Explore one representative file
python scripts/seismoai_explore.py
This script:
- loads one sample file
- prints trace summary statistics
- prints header summary
- computes trace diagnostics
- reports threshold sensitivity for near-dead traces
- validates the full dataset
- visualizes the gather, waveform, and spectrum
Analyze the complete dataset
python scripts/analyze_all_sgy_files.py
This script:
- checks all 166 files
- generates a CSV summary
- compares header structure
- creates dataset-level plots
- saves all outputs in
analysis_outputs/
Testing
Run all tests from the project root:
pytest -q
Current result
- 7 tests passed successfully
The tests cover:
- single-file loading
- folder loading
- normalization
- full dataset validation
- gather plotting
- trace plotting
- spectrum plotting
Why This Submission Meets the Assignment Requirements
- All implemented functions include clear and well-structured Python docstrings as required.
- working functions for Module 1 and Module 2
- real SGY data support
- docstrings for core functions
- tests for all required functions
- dataset understanding before model-related steps
- validation across the full set of 166 SEG-Y files
- professional project structure suitable for GitHub and packaging
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seismoai_m1m2-0.1.0.tar.gz.
File metadata
- Download URL: seismoai_m1m2-0.1.0.tar.gz
- Upload date:
- Size: 12.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1324642d8cc1cb1194ee33944da85d401d2acf1db564b69a00d6e836c019470
|
|
| MD5 |
13edece3f9dd5a56a9bf04915aea337c
|
|
| BLAKE2b-256 |
a9f941b181281a5572fdac4d10494e7a6ff3aae1993b6662337a4f1ea9f5a8b9
|
File details
Details for the file seismoai_m1m2-0.1.0-py3-none-any.whl.
File metadata
- Download URL: seismoai_m1m2-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
208f89c2464ad7fc3a3f6b5ed207a4725546ab26b6b5a3b9192049e29475a6a9
|
|
| MD5 |
14d9ae1c17d2a06c8d62e9ea4ce92a4c
|
|
| BLAKE2b-256 |
5f2310a988c4a5e2bed5839cd1cfa77693e0e330f8538d9c7e4416dac4c4c5f4
|