Skip to main content

Tools for downloading and processing Ocean Networks Canada hydrophone data

Project description

๐ŸŒŠ ONC Data Download and Preparation

Complete guide for downloading Ocean Networks Canada spectrograms, FLAC audio files, and preparing ML-ready HDF5 datasets.

๐Ÿ“‹ Table of Contents

๐Ÿš€ Quick Start

The easiest way to get started is using the tutorial notebook, which runs through several real-world examples:

CLI Quick Start

# 1. Interactive download (guided cli) - uses sampling strategy
#    Now includes option to download FLAC files!
python scripts/download_hydrophone_data.py

# 2. Direct download with custom batch size
python scripts/download_hydrophone_data.py --mode sampling --device ICLISTENHF6020 --start-date 2021 1 1 --threshold 500 --spectrograms-per-batch 12 --check-deployments

# 3. Download spectrograms WITH corresponding FLAC audio files
python scripts/download_hydrophone_data.py --mode sampling --device ICLISTENHF6020 --start-date 2021 1 1 --threshold 500 --spectrograms-per-batch 6 --download-flac

# 4. Generate custom spectrograms from FLAC files (NEW!)
python scripts/generate_spectrograms.py --input-dir data/ICLISTENHF6020/flac/ --win-dur 2.0

โœจ Key Features

  • ๐Ÿค– Smart Interactive Mode: Guided setup that uses the intelligent sampling strategy and includes FLAC audio option
  • ๐ŸŽต FLAC Audio Download: Download corresponding raw audio files alongside spectrograms
  • ๐ŸŽถ Custom Spectrogram Generation: Create spectrograms with any duration/parameters from FLAC files
  • ๐Ÿš€ Deployment Validation: Ensures hydrophones were deployed during requested periods
  • ๐Ÿ“Š Device Discovery: Browse available hydrophones with deployment information
  • โฐ Date Validation: Checks dates fall within active deployment periods
  • ๐Ÿ’พ Efficient Caching: Minimizes API calls through intelligent caching
  • ๐Ÿ”ง Multiple Modes: Sampling, range, specific times, and deployment checking
  • ๐Ÿ“ Universal Folder Support: Works with enhanced, flat, and nested folder structures

โš™๏ธ Setup

  1. Install dependencies:

    pip install -r requirements.txt
    
  2. Configure ONC API token: Create/edit .env file:

    ONC_TOKEN=your_actual_onc_token_here
    DATA_DIR=./data
    

๐Ÿ“ฅ Downloading Spectrograms

๐ŸŽฏ Usage Modes

Mode Description Example
Interactive Guided setup using sampling strategy (recommended) python scripts/download_hydrophone_data.py
Sampling Smart sampling from date range --mode sampling --threshold 1000
Range All spectrograms in date range --mode range --start-date 2021 1 1 --end-date 2021 1 7
Specific Exact timestamps from JSON --mode specific --config times.json
Check View deployment info --mode check-deployments

๐Ÿ“Œ Note: Interactive mode is simply a guided way to set up the intelligent sampling strategy. It prompts you for device, dates, threshold, and spectrograms per batch, then uses the same smart sampling algorithm described below.

๐Ÿง  Intelligent Sampling Strategy

The sampling mode (including interactive mode) uses a smart algorithm to efficiently distribute downloads across your date range:

How it works:

  1. Data Availability Check: Queries ONC API to find which days have data available
  2. Request Calculation: Determines number of requests needed based on spectrograms_per_batch:
    total_requests = ceil(threshold_num / spectrograms_per_batch)
    
  3. Optimal Day Spacing: Distributes requests evenly across available days
  4. Random Time Distribution: Uses random hours (0-23) and minutes (0-59) for maximum temporal diversity
  5. Duplicate Prevention: Automatically skips dates where files already exist
  6. Adaptive Sampling: Handles both sparse sampling across many days and multiple requests per day

Benefits:

  • Even temporal coverage across your entire date range
  • Full 24-hour sampling with random start times for maximum diversity
  • Efficient API usage by checking availability first
  • Resume-friendly by skipping existing downloads

๐Ÿ“Š Spectrograms Per Batch

Control how many 5-minute spectrograms are downloaded per request with --spectrograms-per-batch:

Batch Size Duration
1 5 minutes
6 30 minutes (default)
12 1 hour
36 3 hours
# Custom batch size example
python scripts/download_hydrophone_data.py --mode sampling --spectrograms-per-batch 12

๐Ÿš€ Deployment Validation

Ensures hydrophones were active during requested periods. Add --check-deployments to verify:

  • โœ… Deployment coverage for your dates
  • ๐Ÿ“ Exact locations and coordinates
  • ๐Ÿ” Data availability verification
  • ๐Ÿ’ก Alternative suggestions if needed
python scripts/download_hydrophone_data.py --check-deployments

๐ŸŽ›๏ธ Key Parameters

Parameter Description Default
--mode Download mode Interactive prompt
--device Hydrophone device code Interactive selection
--spectrograms-per-batch Number of 5-min spectrograms per request 6
--download-flac Also download FLAC audio files False
--check-deployments Validate deployment periods Recommended
--start-date Start date (YYYY MM DD) Prompted
--end-date End date (YYYY MM DD) Prompted
--threshold Number of spectrograms Prompted

๐Ÿ“ File Organization

Downloads are organized by device, method, and date range:

data/
โ””โ”€โ”€ DEVICE/
    โ””โ”€โ”€ sampling_YYYY-MM-DD_to_YYYY-MM-DD/
        โ”œโ”€โ”€ mat/
        โ”‚   โ”œโ”€โ”€ processed/     # Downloaded spectrograms
        โ”‚   โ””โ”€โ”€ rejects/       # Quality-filtered files
        โ””โ”€โ”€ flac/              # FLAC audio files (if --download-flac used)

Example: data/ICLISTENHF6020/sampling_2021-01-01_to_2021-01-31/

๐Ÿ“ Specific Times Config

For exact timestamps, create a JSON file:

{
  "ICLISTENHF6020": [
    [2021, 1, 15, 12, 0, 0],
    [2021, 1, 15, 18, 30, 0]
  ]
}

Format: [Year, Month, Day, Hour, Minute, Second]

๐ŸŽต Downloading FLAC Audio Files

FLAC files contain raw hydrophone audio recordings. Add --download-flac to any command or use interactive mode (which now prompts for FLAC preference):

# Interactive mode (prompts for FLAC)
python scripts/download_hydrophone_data.py

# Any mode with FLAC
python scripts/download_hydrophone_data.py --mode sampling --download-flac

Use Cases: Audio analysis, custom spectrograms, ML training on raw audio File Organization: FLAC files saved in flac/ subdirectory alongside spectrograms
Performance: 10-50x larger than spectrograms; start with small downloads (--threshold 5-10)

๐ŸŽถ Custom Spectrogram Generation

Generate custom spectrograms from your downloaded FLAC/WAV audio files with configurable parameters. This functionality translates MATLAB spectrogram code to Python, allowing you to create spectrograms with different durations, frequency ranges, and analysis parameters.

โœจ Features

  • Multiple audio formats: FLAC, WAV, MP3, M4A support
  • Configurable parameters: Window duration, overlap, frequency limits
  • MATLAB compatibility: Outputs .mat files with same structure as MATLAB
  • High-quality plots: PNG visualizations with customizable colormaps
  • Batch processing: Process entire directories efficiently
  • Project integration: Works seamlessly with downloaded FLAC files

๐Ÿš€ Quick Start

# Interactive mode (recommended)
python scripts/generate_spectrograms.py

# Process FLAC files from ONC downloads
python scripts/generate_spectrograms.py --input-dir data/ICLISTENHF6020/flac/

# Custom parameters for longer spectrograms  
python scripts/generate_spectrograms.py \
  --input-dir data/DEVICE/flac/ \
  --win-dur 2.0 \
  --overlap 0.75 \
  --freq-min 5 \
  --freq-max 20000

๐ŸŽ›๏ธ Parameters

Parameter Description Default
--win-dur Window duration in seconds 1.0
--overlap Overlap ratio (0-1) 0.5
--freq-min Minimum frequency (Hz) 10
--freq-max Maximum frequency (Hz) 10000
--colormap Matplotlib colormap turbo
--clim-min Color scale minimum (dB) -60
--clim-max Color scale maximum (dB) 0

๐Ÿ“ Output Structure

Custom spectrograms are saved parallel to FLAC directories:

data/
โ””โ”€โ”€ DEVICE/
    โ””โ”€โ”€ sampling_YYYY-MM-DD_to_YYYY-MM-DD/
        โ”œโ”€โ”€ flac/                    # Downloaded audio files
        โ””โ”€โ”€ custom_spectrograms/     # Generated spectrograms
            โ”œโ”€โ”€ audio1.mat           # MATLAB data files
            โ”œโ”€โ”€ audio1.png           # PNG visualizations
            โ”œโ”€โ”€ audio2.mat
            โ””โ”€โ”€ audio2.png

โš™๏ธ Configuration

Parameters can be configured in config/dataset_config.yaml:

custom_spectrograms:
  window_duration: 1.0     # Window duration in seconds
  overlap: 0.5             # Overlap ratio (0-1)
  frequency_limits:
    min: 10                # Minimum frequency (Hz)
    max: 10000             # Maximum frequency (Hz)
  colormap: "turbo"        # Matplotlib colormap
  color_limits:
    min: -60               # Color scale minimum (dB)
    max: 0                 # Color scale maximum (dB)
  log_frequency: true      # Use log frequency scale

๐Ÿ”ฌ Use Cases

Different Analysis Requirements:

  • High time resolution: --win-dur 0.5 --overlap 0.75 for transient events
  • High frequency resolution: --win-dur 4.0 --overlap 0.9 for tonal analysis
  • Low-frequency focus: --freq-min 1 --freq-max 1000 for whale calls
  • Wideband analysis: --freq-min 1 --freq-max 50000 for full spectrum

Custom Duration Spectrograms: Unlike ONC's fixed 5-minute spectrograms, you can create any duration by adjusting window parameters to analyze longer or shorter audio segments.

๐Ÿ’ป Programmatic Usage

from src.audio import SpectrogramGenerator

# Create generator with custom parameters
generator = SpectrogramGenerator(
    win_dur=2.0,           # 2 second windows
    overlap=0.75,          # 75% overlap
    freq_lims=(5, 20000),  # 5 Hz to 20 kHz
    colormap='viridis'
)

# Process a directory
results = generator.process_directory(
    input_dir="data/DEVICE/flac/",
    save_dir="data/DEVICE/custom_spectrograms/",
    save_mat=True,
    save_plot=True
)

See examples/generate_custom_spectrograms_example.py for complete examples.

๐Ÿ”ง Advanced Options

# Download with custom settings
python scripts/download_hydrophone_data.py --mode sampling --device ICLISTENHF6020 --spectrograms-per-batch 12 --threshold 200 --check-deployments
# 1. Download with custom batch size
python scripts/download_hydrophone_data.py --mode sampling --spectrograms-per-batch 12 --check-deployments

๐Ÿ› ๏ธ Troubleshooting

Issue Solution
Invalid ONC Token Check .env file
No Deployment Coverage Use --check-deployments
No .mat files found Verify folder structure
Labels not loading Check JSON syntax
Memory errors Reduce --batch-size
FLAC download fails Check network connection and storage space
Large FLAC files Monitor disk space, start with small downloads

๐Ÿ’ก Pro Tip: Always use --check-deployments to ensure active deployment periods!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onc_hydrophone_data-0.1.0.tar.gz (46.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

onc_hydrophone_data-0.1.0-py3-none-any.whl (49.2 kB view details)

Uploaded Python 3

File details

Details for the file onc_hydrophone_data-0.1.0.tar.gz.

File metadata

  • Download URL: onc_hydrophone_data-0.1.0.tar.gz
  • Upload date:
  • Size: 46.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for onc_hydrophone_data-0.1.0.tar.gz
Algorithm Hash digest
SHA256 38441fb8c3a146b58ac8d5c7fc0c8afe5fa000d7b68e4ccedbe7a9e1b9f10409
MD5 f4652b9e76500edd8da0d18099254183
BLAKE2b-256 3869e54eea5bc62924a9632f570db9cbf63dbfc075235c7bd7ab5900bac77c61

See more details on using hashes here.

File details

Details for the file onc_hydrophone_data-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for onc_hydrophone_data-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dfaa4744277323ef1116b503c0e82e4e0580bc072e26405aa9fb38a5353e7033
MD5 9eaeb59115c29c37b2175e53e2d56fcf
BLAKE2b-256 76bdaf568b2e40e0ceb3695983073a392583a6b3423be5904b074c70328b3332

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page