Process oceanographic measurements from USVs and write partitioned GeoParquet datasets with geospatial metadata.

These details have not been verified by PyPI

Project links

Homepage

Project description

Oceanstream

This project processes oceanographic measurements collected from Unmanned Surface Vehicles (USVs). The pipeline reads CSV files from a specified directory, consolidates the data, and stores it in a GeoParquet format partitioned by latitude and longitude bins. Optionally, it can upload the resulting GeoParquet files to Azure Blob Storage.

Project Structure

oceanstream
├── src
│   ├── app.py                # Main entry point for the application
│   ├── cli.py                # Command-line interface for running the pipeline
│   ├── config                # Configuration settings
│   │   ├── __init__.py
│   │   └── settings.py
│   ├── pipeline              # Data processing pipeline
│   │   ├── __init__.py
│   │   ├── csv_reader.py     # Functions for reading CSV files
│   │   ├── binning.py        # Functions for data partitioning
│   │   └── geoparquet_writer.py # Functions for writing geoparquet files
│   ├── storage               # Storage handling
│   │   ├── __init__.py
│   │   ├── local.py          # Local storage functions
│   │   └── azure_blob.py     # Azure Blob Storage functions
│   └── types                 # Data models and types
│       ├── __init__.py
│       └── models.py
├── data
│   └── raw_data              # Directory for raw CSV data
│       └── .gitkeep
├── tests                     # Unit tests for the application
│   ├── __init__.py
│   └── test_pipeline.py
├── .env.example              # Template for environment variables
├── .gitignore                # Git ignore file
├── .python-version           # Python version specification
├── pyproject.toml            # Project dependencies and configuration
└── README.md                 # Project documentation

Setup Instructions

Clone the Repository:

git clone <repository-url>
cd oceanstream

Create a Virtual Environment:

python3.12 -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install Dependencies:

Oceanstream is organized into optional processing modules. Install only what you need:
```
# Install core + geotrack processing (GPS/navigation data → GeoParquet)
pip install -e ".[geotrack]"

# Install core + echodata processing (echosounder data → Zarr)
pip install -e ".[echodata]"

# Install all processing modules
pip install -e ".[all]"

# Install for development (includes all modules + dev tools)
pip install -e ".[all]" -r requirements-dev.txt
```
Available extras:
- geotrack - GPS/navigation track processing (pandas, geopandas, shapely)
- echodata - Echosounder data processing (echopype, xarray, zarr, netcdf4)
- multibeam - Multibeam sonar processing (planned)
- adcp - ADCP current profiler processing (planned)
- all - All processing modules
- geo - Legacy alias for geotrack
Configure Environment Variables: Copy .env.example to .env and fill in the necessary configuration values.

Run Processing Commands:

Oceanstream provides separate commands for each data type:

# Process geotrack data (CSV → GeoParquet)
oceanstream process geotrack --input-dir raw_data --output-dir out/geoparquet -v

# Process echosounder data (planned - requires echodata extra)
oceanstream process echodata --input-dir raw_echodata --output-dir out/echodata -v

# Process multibeam data (planned - requires multibeam extra)
oceanstream process multibeam --input-dir raw_multibeam --output-dir out/multibeam -v

# Process ADCP data (planned - requires adcp extra)
oceanstream process adcp --input-dir raw_adcp --output-dir out/adcp -v

# List available data providers
oceanstream providers

All processing commands support --provider flag to specify the data source:

oceanstream process --provider saildrone geotrack --input-dir data -v

Usage

Geotrack Processing (GPS/Navigation Data)

CLI usage examples:

# Process sample fixture data bundled with tests
oceanstream process geotrack --input-dir oceanstream/tests/data/raw_data --output-dir out/geoparquet -v

# Process your raw_data directory at repo root (default input-dir is ./raw_data)
oceanstream process geotrack --output-dir out/geoparquet -v

# Dry run to see what would be processed
oceanstream process geotrack --input-dir raw_data --dry-run -v

# List available columns in the data
oceanstream process geotrack --input-dir raw_data --list-columns

The CLI reads CSVs, auto-derives coarse 5° bins, and writes a partitioned GeoParquet dataset with metadata. Use -v for progress logs.

Processing Modules

Oceanstream is organized into separate processing modules:

oceanstream.geotrack - Process GPS/navigation track data into GeoParquet format
oceanstream.echodata - Process echosounder data (EK60/EK80) into Zarr (coming soon)
oceanstream.multibeam - Process multibeam sonar data (coming soon)
oceanstream.adcp - Process ADCP current profiler data (coming soon)

Each module can be installed independently using pip extras (see Installation section).

Using OceanStream Data in GIS Tools

OceanStream generates cloud-optimized GeoParquet files designed to work seamlessly with modern GIS tools and data analysis frameworks. Our output includes:

GeoParquet: Columnar format with embedded geometry and spatial partitioning
STAC Metadata: Standard catalog format for discovery and integration
PMTiles (optional): Vector tiles for web-based visualization

Comprehensive GIS Integration Guides

We provide detailed integration guides for popular GIS tools and frameworks:

Desktop GIS:

QGIS - Open-source desktop GIS
ArcGIS Pro - Professional ESRI platform

Data Analysis:

DuckDB - Fast in-process SQL analytics
GeoPandas - Python spatial data analysis

Web GIS (coming soon):

Leaflet + PMTiles
Mapbox GL JS
STAC Browser

See GIS Integration Documentation for complete guides with:

Installation instructions
Step-by-step usage examples
Code samples and workflows
Performance optimization tips
Troubleshooting guides

Quick Start Examples

Load in QGIS:

# Generate data
oceanstream process geotrack --input-source ./data/sample.csv --output-dir ./output

# Open QGIS and drag-and-drop .parquet files from:
# output/campaign_id/lat_bin=X/lon_bin=Y/*.parquet

Query with DuckDB:

INSTALL spatial;
LOAD spatial;

SELECT time, latitude, longitude, temperature_sea_water
FROM read_parquet('output/campaign_id/**/*.parquet')
WHERE lat_bin = 30 AND lon_bin = -120
LIMIT 10;

Analyze with GeoPandas:

import geopandas as gpd

# Read all spatial partitions
gdf = gpd.read_parquet('output/campaign_id/')

# Filter and analyze
warm_water = gdf[gdf['temperature_sea_water'] > 25]
print(f"Found {len(warm_water)} warm water measurements")

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any enhancements or bug fixes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.1

Jan 17, 2026

0.0.15.dev0 pre-release

Aug 19, 2024

0.0.14.dev0 pre-release

Aug 17, 2024

0.0.12.dev0 pre-release

Jun 13, 2024

0.0.10.dev0 pre-release

Jun 12, 2024

0.0.8.dev0 pre-release

Jun 7, 2024

0.0.6.dev0 pre-release

Jun 7, 2024

0.0.4.dev0 pre-release

Jun 7, 2024

0.0.3.dev0 pre-release

Jun 7, 2024

0.0.2.dev0 pre-release

Jun 7, 2024

0.0.1.dev0 pre-release

Jun 7, 2024

0.0.0.dev0 pre-release

Jun 7, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oceanstream-0.1.1.tar.gz (237.1 kB view details)

Uploaded Jan 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

oceanstream-0.1.1-py3-none-any.whl (305.1 kB view details)

Uploaded Jan 17, 2026 Python 3

File details

Details for the file oceanstream-0.1.1.tar.gz.

File metadata

Download URL: oceanstream-0.1.1.tar.gz
Upload date: Jan 17, 2026
Size: 237.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for oceanstream-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`685ff78f75587fb26c464d2426f4c2179034785aee4d3ab4bac185a66d7c62ec`
MD5	`fe7d45408ad24add1fc76e9a38d32fdf`
BLAKE2b-256	`c89c6105ac1521d1795e445131a6fbf91938bbaab9d7457f31313a98c26cacf5`

See more details on using hashes here.

File details

Details for the file oceanstream-0.1.1-py3-none-any.whl.

File metadata

Download URL: oceanstream-0.1.1-py3-none-any.whl
Upload date: Jan 17, 2026
Size: 305.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for oceanstream-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5e3c044c632290130b0d015b908e9798a8d77796e95b8c649fc40875cc287ab8`
MD5	`5e4d46feb66612ea6e54f154328690a8`
BLAKE2b-256	`22eb22bd6902eb28e680a4cfa653a0363dfe1bb3c17b158b864d9a827f35009d`

See more details on using hashes here.

oceanstream 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Oceanstream

Project Structure

Setup Instructions

Usage

Geotrack Processing (GPS/Navigation Data)

Processing Modules

Using OceanStream Data in GIS Tools

Comprehensive GIS Integration Guides

Quick Start Examples

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes