Skip to main content

Marine bioacoustics detection pipeline using Perch v2 and Gemma 4

Project description

whalu logo

PyPI Python code checks unit tests License

Marine bioacoustics detection pipeline. Slides a 5-second window over continuous hydrophone recordings, runs the Google Perch multispecies whale model, and stores detections as Parquet files for analysis.

Install

pip install whalu

Or with uv:

uv add whalu

CLI

whalu [-v] <command> ...
Command Description
whalu scan mbari Run detection over MBARI Pacific Sound (S3, no auth required)
whalu scan orcasound Run detection over Orcasound labeled samples (S3, no auth required)
whalu analyze Summarize and visualize stored detections
whalu info [source] Show sensor/dataset metadata for a source

whalu scan mbari

Flag Default Description
--start YYYY-MM required First year-month to process
--end YYYY-MM same as --start Last year-month (inclusive)
--max-files N all Stop after N files
--limit-hours N full file Only process first N hours per file
--output-dir PATH data/detections/mbari Where to write Parquet files

whalu scan orcasound

Flag Default Description
--key S3_KEY labeled killer whale sample Specific S3 key to process
--output-dir PATH data/detections/orcasound Where to write Parquet files

whalu analyze

Flag Default Description
--input-dir PATH required Directory of detection Parquet files
--top-n N 5 Number of top species shown in heatmap

whalu info

Flag Default Description
source (all) Source ID to inspect (mbari, orcasound)

Examples

# Single file, first hour only (quick test)
whalu scan mbari --start 2026-03 --max-files 1 --limit-hours 1

# Full month
whalu scan mbari --start 2026-03 --output-dir data/detections/mbari

# Multi-month date range (blue whale season)
whalu scan mbari --start 2023-07 --end 2023-10

# Orcasound validation sample
whalu scan orcasound

# Analyze stored detections
whalu analyze --input-dir data/detections/mbari

# Show source metadata
whalu info mbari
whalu info

Supported data sources

Source Location Coverage Format
MBARI Pacific Sound Monterey Canyon, CA 2015-present 16 kHz 24-bit WAV, 1 file/day (4.1 GB)
Orcasound Puget Sound, WA 2017-present 20 kHz 16-bit WAV

How it works

  • Streams 4 GB daily files via S3 range requests in 1-hour chunks (~172 MB each), bounded RAM
  • Applies sigmoid activation (correct for the multi-label whale model, not softmax)
  • Emits detections only where confidence >= 0.5
  • Stores one Parquet file per audio source, runs are resumable

Detection model

Google multispecies_whale via perch-hoplite. 12 classes: blue whale (Bm), fin whale (Bp), humpback (Mn), minke (Ba), Bryde's (Be), sei (Bs), right whale (Eg), orca (Oo), and call types (Upcall, Gunshot, Call, Echolocation, Whistle).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whalu-0.1.1.tar.gz (305.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whalu-0.1.1-py3-none-any.whl (26.9 kB view details)

Uploaded Python 3

File details

Details for the file whalu-0.1.1.tar.gz.

File metadata

  • Download URL: whalu-0.1.1.tar.gz
  • Upload date:
  • Size: 305.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for whalu-0.1.1.tar.gz
Algorithm Hash digest
SHA256 dddd52d5a48da295fb6116a2489504a5b2d69769303dca706e4d7cbfb4c417aa
MD5 20c65a3cedc1616a0abb2920d1d18052
BLAKE2b-256 4dd867eef33f492169e3d9587ab4284891e85efd7631e169a53087fe1bf84056

See more details on using hashes here.

File details

Details for the file whalu-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: whalu-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 26.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for whalu-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0a5248efc06759c6e06541a23cf8b9273ebe0fe9a98cbd0a35e6a4bb214511ef
MD5 e903bab0b03da7017d0243afacdf78b5
BLAKE2b-256 b64ffdccaf751ea6a4eab1c3418cc2f82febf881048bed4a77ac7cffa4e678ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page