Tools for downloading and processing Ocean Networks Canada hydrophone data
Project description
๐ ONC Data Download and Preparation
Complete guide for downloading Ocean Networks Canada spectrograms, FLAC audio files, and preparing ML-ready HDF5 datasets.
๐ Table of Contents
- ๐ Quick Start
- โ๏ธ Setup
- ๐ฅ Downloading Spectrograms
- ๐ต Downloading FLAC Audio Files
- ๐ถ Custom Spectrogram Generation
- ๐ง Advanced Options
- ๐ ๏ธ Troubleshooting
๐ Quick Start
The easiest way to get started is using the tutorial notebook, which runs through several real-world examples:
- ๐ Tutorial Notebook
CLI Quick Start
# 1. Interactive download (guided cli) - uses sampling strategy
# Now includes option to download FLAC files!
python scripts/download_hydrophone_data.py
# 2. Direct download with custom batch size
python scripts/download_hydrophone_data.py --mode sampling --device ICLISTENHF6020 --start-date 2021 1 1 --threshold 500 --spectrograms-per-batch 12 --check-deployments
# 3. Download spectrograms WITH corresponding FLAC audio files
python scripts/download_hydrophone_data.py --mode sampling --device ICLISTENHF6020 --start-date 2021 1 1 --threshold 500 --spectrograms-per-batch 6 --download-flac
# 4. Generate custom spectrograms from FLAC files (NEW!)
python scripts/generate_spectrograms.py --input-dir data/ICLISTENHF6020/flac/ --win-dur 2.0
โจ Key Features
- ๐ค Smart Interactive Mode: Guided setup that uses the intelligent sampling strategy and includes FLAC audio option
- ๐ต FLAC Audio Download: Download corresponding raw audio files alongside spectrograms
- ๐ถ Custom Spectrogram Generation: Create spectrograms with any duration/parameters from FLAC files
- ๐ Deployment Validation: Ensures hydrophones were deployed during requested periods
- ๐ Device Discovery: Browse available hydrophones with deployment information
- โฐ Date Validation: Checks dates fall within active deployment periods
- ๐พ Efficient Caching: Minimizes API calls through intelligent caching
- ๐ง Multiple Modes: Sampling, range, specific times, and deployment checking
- ๐ Universal Folder Support: Works with enhanced, flat, and nested folder structures
โ๏ธ Setup
-
Install dependencies:
pip install -r requirements.txt
-
Configure ONC API token: Create/edit
.envfile:ONC_TOKEN=your_actual_onc_token_here DATA_DIR=./data
๐ฅ Downloading Spectrograms
๐ฏ Usage Modes
| Mode | Description | Example |
|---|---|---|
| Interactive | Guided setup using sampling strategy (recommended) | python scripts/download_hydrophone_data.py |
| Sampling | Smart sampling from date range | --mode sampling --threshold 1000 |
| Range | All spectrograms in date range | --mode range --start-date 2021 1 1 --end-date 2021 1 7 |
| Specific | Exact timestamps from JSON | --mode specific --config times.json |
| Check | View deployment info | --mode check-deployments |
๐ Note: Interactive mode is simply a guided way to set up the intelligent sampling strategy. It prompts you for device, dates, threshold, and spectrograms per batch, then uses the same smart sampling algorithm described below.
๐ง Intelligent Sampling Strategy
The sampling mode (including interactive mode) uses a smart algorithm to efficiently distribute downloads across your date range:
How it works:
- Data Availability Check: Queries ONC API to find which days have data available
- Request Calculation: Determines number of requests needed based on
spectrograms_per_batch:total_requests = ceil(threshold_num / spectrograms_per_batch) - Optimal Day Spacing: Distributes requests evenly across available days
- Random Time Distribution: Uses random hours (0-23) and minutes (0-59) for maximum temporal diversity
- Duplicate Prevention: Automatically skips dates where files already exist
- Adaptive Sampling: Handles both sparse sampling across many days and multiple requests per day
Benefits:
- Even temporal coverage across your entire date range
- Full 24-hour sampling with random start times for maximum diversity
- Efficient API usage by checking availability first
- Resume-friendly by skipping existing downloads
๐ Spectrograms Per Batch
Control how many 5-minute spectrograms are downloaded per request with --spectrograms-per-batch:
| Batch Size | Duration |
|---|---|
1 |
5 minutes |
6 |
30 minutes (default) |
12 |
1 hour |
36 |
3 hours |
# Custom batch size example
python scripts/download_hydrophone_data.py --mode sampling --spectrograms-per-batch 12
๐ Deployment Validation
Ensures hydrophones were active during requested periods. Add --check-deployments to verify:
- โ Deployment coverage for your dates
- ๐ Exact locations and coordinates
- ๐ Data availability verification
- ๐ก Alternative suggestions if needed
python scripts/download_hydrophone_data.py --check-deployments
๐๏ธ Key Parameters
| Parameter | Description | Default |
|---|---|---|
--mode |
Download mode | Interactive prompt |
--device |
Hydrophone device code | Interactive selection |
--spectrograms-per-batch |
Number of 5-min spectrograms per request | 6 |
--download-flac |
Also download FLAC audio files | False |
--check-deployments |
Validate deployment periods | Recommended |
--start-date |
Start date (YYYY MM DD) | Prompted |
--end-date |
End date (YYYY MM DD) | Prompted |
--threshold |
Number of spectrograms | Prompted |
๐ File Organization
Downloads are organized by device, method, and date range:
data/
โโโ DEVICE/
โโโ sampling_YYYY-MM-DD_to_YYYY-MM-DD/
โโโ mat/
โ โโโ processed/ # Downloaded spectrograms
โ โโโ rejects/ # Quality-filtered files
โโโ flac/ # FLAC audio files (if --download-flac used)
Example: data/ICLISTENHF6020/sampling_2021-01-01_to_2021-01-31/
๐ Specific Times Config
For exact timestamps, create a JSON file:
{
"ICLISTENHF6020": [
[2021, 1, 15, 12, 0, 0],
[2021, 1, 15, 18, 30, 0]
]
}
Format: [Year, Month, Day, Hour, Minute, Second]
๐ต Downloading FLAC Audio Files
FLAC files contain raw hydrophone audio recordings. Add --download-flac to any command or use interactive mode (which now prompts for FLAC preference):
# Interactive mode (prompts for FLAC)
python scripts/download_hydrophone_data.py
# Any mode with FLAC
python scripts/download_hydrophone_data.py --mode sampling --download-flac
Use Cases: Audio analysis, custom spectrograms, ML training on raw audio
File Organization: FLAC files saved in flac/ subdirectory alongside spectrograms
Performance: 10-50x larger than spectrograms; start with small downloads (--threshold 5-10)
๐ถ Custom Spectrogram Generation
Generate custom spectrograms from your downloaded FLAC/WAV audio files with configurable parameters. This functionality translates MATLAB spectrogram code to Python, allowing you to create spectrograms with different durations, frequency ranges, and analysis parameters.
โจ Features
- Multiple audio formats: FLAC, WAV, MP3, M4A support
- Configurable parameters: Window duration, overlap, frequency limits
- MATLAB compatibility: Outputs .mat files with same structure as MATLAB
- High-quality plots: PNG visualizations with customizable colormaps
- Batch processing: Process entire directories efficiently
- Project integration: Works seamlessly with downloaded FLAC files
๐ Quick Start
# Interactive mode (recommended)
python scripts/generate_spectrograms.py
# Process FLAC files from ONC downloads
python scripts/generate_spectrograms.py --input-dir data/ICLISTENHF6020/flac/
# Custom parameters for longer spectrograms
python scripts/generate_spectrograms.py \
--input-dir data/DEVICE/flac/ \
--win-dur 2.0 \
--overlap 0.75 \
--freq-min 5 \
--freq-max 20000
๐๏ธ Parameters
| Parameter | Description | Default |
|---|---|---|
--win-dur |
Window duration in seconds | 1.0 |
--overlap |
Overlap ratio (0-1) | 0.5 |
--freq-min |
Minimum frequency (Hz) | 10 |
--freq-max |
Maximum frequency (Hz) | 10000 |
--colormap |
Matplotlib colormap | turbo |
--clim-min |
Color scale minimum (dB) | -60 |
--clim-max |
Color scale maximum (dB) | 0 |
๐ Output Structure
Custom spectrograms are saved parallel to FLAC directories:
data/
โโโ DEVICE/
โโโ sampling_YYYY-MM-DD_to_YYYY-MM-DD/
โโโ flac/ # Downloaded audio files
โโโ custom_spectrograms/ # Generated spectrograms
โโโ audio1.mat # MATLAB data files
โโโ audio1.png # PNG visualizations
โโโ audio2.mat
โโโ audio2.png
โ๏ธ Configuration
Parameters can be configured in config/dataset_config.yaml:
custom_spectrograms:
window_duration: 1.0 # Window duration in seconds
overlap: 0.5 # Overlap ratio (0-1)
frequency_limits:
min: 10 # Minimum frequency (Hz)
max: 10000 # Maximum frequency (Hz)
colormap: "turbo" # Matplotlib colormap
color_limits:
min: -60 # Color scale minimum (dB)
max: 0 # Color scale maximum (dB)
log_frequency: true # Use log frequency scale
๐ฌ Use Cases
Different Analysis Requirements:
- High time resolution:
--win-dur 0.5 --overlap 0.75for transient events - High frequency resolution:
--win-dur 4.0 --overlap 0.9for tonal analysis - Low-frequency focus:
--freq-min 1 --freq-max 1000for whale calls - Wideband analysis:
--freq-min 1 --freq-max 50000for full spectrum
Custom Duration Spectrograms: Unlike ONC's fixed 5-minute spectrograms, you can create any duration by adjusting window parameters to analyze longer or shorter audio segments.
๐ป Programmatic Usage
from src.audio import SpectrogramGenerator
# Create generator with custom parameters
generator = SpectrogramGenerator(
win_dur=2.0, # 2 second windows
overlap=0.75, # 75% overlap
freq_lims=(5, 20000), # 5 Hz to 20 kHz
colormap='viridis'
)
# Process a directory
results = generator.process_directory(
input_dir="data/DEVICE/flac/",
save_dir="data/DEVICE/custom_spectrograms/",
save_mat=True,
save_plot=True
)
See examples/generate_custom_spectrograms_example.py for complete examples.
๐ง Advanced Options
# Download with custom settings
python scripts/download_hydrophone_data.py --mode sampling --device ICLISTENHF6020 --spectrograms-per-batch 12 --threshold 200 --check-deployments
# 1. Download with custom batch size
python scripts/download_hydrophone_data.py --mode sampling --spectrograms-per-batch 12 --check-deployments
๐ ๏ธ Troubleshooting
| Issue | Solution |
|---|---|
| Invalid ONC Token | Check .env file |
| No Deployment Coverage | Use --check-deployments |
| No .mat files found | Verify folder structure |
| Labels not loading | Check JSON syntax |
| Memory errors | Reduce --batch-size |
| FLAC download fails | Check network connection and storage space |
| Large FLAC files | Monitor disk space, start with small downloads |
๐ก Pro Tip: Always use --check-deployments to ensure active deployment periods!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file onc_hydrophone_data-0.1.0.tar.gz.
File metadata
- Download URL: onc_hydrophone_data-0.1.0.tar.gz
- Upload date:
- Size: 46.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38441fb8c3a146b58ac8d5c7fc0c8afe5fa000d7b68e4ccedbe7a9e1b9f10409
|
|
| MD5 |
f4652b9e76500edd8da0d18099254183
|
|
| BLAKE2b-256 |
3869e54eea5bc62924a9632f570db9cbf63dbfc075235c7bd7ab5900bac77c61
|
File details
Details for the file onc_hydrophone_data-0.1.0-py3-none-any.whl.
File metadata
- Download URL: onc_hydrophone_data-0.1.0-py3-none-any.whl
- Upload date:
- Size: 49.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dfaa4744277323ef1116b503c0e82e4e0580bc072e26405aa9fb38a5353e7033
|
|
| MD5 |
9eaeb59115c29c37b2175e53e2d56fcf
|
|
| BLAKE2b-256 |
76bdaf568b2e40e0ceb3695983073a392583a6b3423be5904b074c70328b3332
|