Skip to main content

Open AIS data platform for Python

Project description

๐Ÿ”ฑ Neptune AIS

Open AIS data platform for Python

Download, normalize, fuse, and analyze vessel tracking data from multiple open-source AIS archives.
One interface. Many sources. Clean output.

Installation โ€ข Quick Start โ€ข Data Sources โ€ข Features โ€ข CLI โ€ข Docs

PyPI Python License Tests


What is Neptune?

Neptune is a Python library that gives you a single, unified interface to download, normalize, and analyze AIS (Automatic Identification System) vessel tracking data from multiple open-source archives.

AIS data powers maritime domain awareness โ€” vessel tracking, trade analytics, environmental monitoring, fishing surveillance, and port operations. But working with it is painful: every provider uses a different format, schema, delivery mechanism, and quality profile. Neptune handles all of that so you can focus on analysis.

from neptune_ais import Neptune

n = Neptune("2024-06-15", sources=["noaa"])
n.download()

positions = n.positions()          # Polars LazyFrame โ€” normalized, QC'd
result = n.sql("SELECT mmsi, count(*) as n FROM positions GROUP BY mmsi ORDER BY n DESC LIMIT 5")

Think of Neptune as Herbie for maritime data โ€” a clean data-access layer that handles the messy plumbing of fetching, normalizing, and cataloging data from heterogeneous archives, so you get reproducible, analysis-ready output every time.

Key Features

  • Multi-source ingestion โ€” Download from NOAA, DMA, Global Fishing Watch, Finland, AISHub, and AISStream through one API
  • Automatic normalization โ€” Every source is normalized to a canonical schema with QC scoring and provenance tracking
  • Multi-source fusion โ€” Merge overlapping sources with configurable dedup strategies (best, union, prefer:<source>)
  • Polars-native โ€” Query positions, vessels, tracks, and events as lazy DataFrames with full predicate pushdown
  • SQL via DuckDB โ€” Run SQL queries directly over your cataloged data
  • Event detection โ€” Derive port calls, EEZ crossings, vessel encounters, and loitering from raw positions
  • Real-time streaming โ€” Connect to live AIS feeds with backpressure, checkpointing, and durable sinks
  • Interactive maps โ€” Visualize positions, tracks, and events with lonboard
  • Plugin system โ€” Add custom source adapters via Python entry points
  • CLI included โ€” neptune download, neptune inventory, neptune sql, and more

Installation

Neptune's core is lightweight โ€” only Polars, Pydantic, and httpx are required. Everything else is opt-in.

# Core (Polars + Pydantic + httpx)
pip install neptune-ais

# With SQL support (DuckDB)
pip install neptune-ais[sql]

# With spatial & visualization (GeoDataFrames, lonboard, H3)
pip install neptune-ais[geo]

# With real-time streaming (WebSocket feeds)
pip install neptune-ais[stream]

# With the CLI (Click + Rich)
pip install neptune-ais[cli]

# Everything
pip install neptune-ais[all]

Requirements: Python 3.10+

Optional dependency groups explained
Extra Adds Used by
sql duckdb Neptune.sql(), Neptune.duckdb(), DuckDBSink
parquet pyarrow Full Parquet write options (compression, statistics)
geo shapely, geopandas, movingpandas, lonboard, h3 Boundary lookups, GeoDataFrame bridges, maps
stream websockets NeptuneStream, live AIS feeds
cli click, rich neptune console commands
notebooks jupyter, ipykernel Interactive notebook examples
dev pytest, mypy, ruff, coverage, nbstripout Development and testing
all All of the above (except dev) Full-featured install

Quick Start

Download and query AIS data

from neptune_ais import Neptune

# Download a day of NOAA AIS data
# Neptune handles: fetch โ†’ normalize โ†’ QC โ†’ partition โ†’ catalog
n = Neptune("2024-06-15", sources=["noaa"])
n.download()

# Query as a Polars LazyFrame
positions = n.positions()
df = positions.collect()
print(f"{len(df):,} position reports from {df['mmsi'].n_unique():,} vessels")

# SQL queries via DuckDB
top_vessels = n.sql("""
    SELECT mmsi, count(*) as n
    FROM positions
    GROUP BY mmsi
    ORDER BY n DESC
    LIMIT 10
""")

Common operations with helpers

from neptune_ais.helpers import latest_positions, snapshot, vessel_history

# Most recent position per vessel
latest = latest_positions(positions)

# Point-in-time snapshot โ€” where was every vessel at noon?
noon = snapshot(positions, when="2024-06-15T12:00:00")

# Full history for a single vessel
history = vessel_history(367000001, positions=positions)

Multi-source fusion

# Combine NOAA and DMA with automatic deduplication
n = Neptune(
    ("2024-06-15", "2024-06-16"),
    sources=["noaa", "dma"],
    merge="best",       # "best" | "union" | "prefer:noaa"
)
n.download()
fused = n.positions()   # Deduplicated across sources

Event detection

# Derive maritime events from position data
events = n.events(kind="port_call", min_confidence=0.7)

# Event types: port_call, eez_crossing, encounter, loitering
# Each event includes confidence scores and full provenance

Real-time streaming

import asyncio
from neptune_ais.stream import NeptuneStream, StreamConfig
from neptune_ais.sinks import ParquetSink, promote_landing

config = StreamConfig(
    source="aisstream",
    api_key="YOUR_KEY",
    bbox=(-74.5, 40.0, -73.5, 41.0),  # New York harbor
)

async def ingest():
    sink = ParquetSink("/tmp/neptune_landing", source="aisstream")
    async with NeptuneStream(config=config) as stream:
        await stream.run_sink(sink, max_messages=10_000)
    # Promote to canonical storage
    promote_landing("/tmp/neptune_landing", store_root="~/.neptune", source="aisstream")

asyncio.run(ingest())

Data Sources

Neptune includes adapters for six open AIS data providers, with a plugin system for adding more.

Source Provider Coverage Delivery Auth Backfill
noaa NOAA AIS Archive US waters, global ATON Daily files None Yes
dma Danish Maritime Authority European waters Daily files None Yes
gfw Global Fishing Watch Global (satellite AIS) Daily files API key Yes
finland Digitraffic Finland Finnish waters Epoch-based None Yes
aishub AISHub Global (variable quality) Multiple feeds API key Yes
aisstream AISStream Global (real-time) WebSocket API key No (live only)

Discover sources programmatically

from neptune_ais import sources

sources.load_all_adapters()

# List all sources
for s in sources.catalog():
    print(f"{s.source_id:<12} {s.provider:<30} auth={s.auth_scheme or 'none'}")

# Find open-data sources with backfill
for s in sources.find_sources(backfill=True, auth=False):
    print(s.source_id)

Add a custom source via plugin

External packages register adapters through Python entry points:

# In your plugin's pyproject.toml
[project.entry-points."neptune_ais.adapters"]
my_source = "my_package.adapter:MyAdapter"

Features

Architecture

Neptune is organized around a canonical dataset family and a three-layer local store:

                                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                โ”‚   Your Code    โ”‚
                                โ”‚  Polars / SQL  โ”‚
                                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                        โ”‚
                              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                              โ”‚     Neptune API    โ”‚
                              โ”‚ .positions()       โ”‚
                              โ”‚ .tracks()          โ”‚
                              โ”‚ .events()          โ”‚
                              โ”‚ .sql()             โ”‚
                              โ””โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”˜
                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                   โ”‚  Archival Path โ”‚     โ”‚  Streaming Path    โ”‚
                   โ”‚  fetch โ†’ norm  โ”‚     โ”‚  NeptuneStream     โ”‚
                   โ”‚  โ†’ QC โ†’ store  โ”‚     โ”‚  โ†’ sink โ†’ promote  โ”‚
                   โ””โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ”‚                  Three-Layer Store                       โ”‚
       โ”‚  raw/ (source payloads)  โ†’  canonical/ (normalized)     โ”‚
       โ”‚                          โ†’  derived/ (cached products)  โ”‚
       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ”‚              Catalog & Manifests                         โ”‚
       โ”‚  partition tracking ยท schema versions ยท QC summaries    โ”‚
       โ”‚  staleness detection ยท atomic writes (stage โ†’ commit)   โ”‚
       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Canonical Datasets

Dataset Description Schema
positions Timestamped AIS point observations (mmsi, lat, lon, sog, cog, ...) positions/v1
vessels Vessel identity and reference data (slowly changing dimensions) vessels/v1
tracks Derived trip/trajectory segments tracks/v1
events Maritime events (port calls, EEZ crossings, encounters, loitering) events/v1

Quality Control

Every ingested record passes through row-level and partition-level QC checks:

  • Data type validation, range checks, sentinel detection
  • Confidence scoring in three tiers: HIGH (>= 0.7), MEDIUM (0.3โ€“0.7), LOW (< 0.3)
  • Per-adapter QC rule injection for source-specific quirks
  • Full provenance tracking from source through fusion

Fusion Modes

When querying across multiple sources, Neptune supports three merge strategies:

Mode Behavior
best Deduplicate with configurable field-level precedence
union Keep all records from all sources, tag provenance
prefer:<source> Deterministic source preference (e.g., prefer:noaa)

Event Detection

Neptune derives four maritime event families from position data using heuristic detectors:

Event Description
Port calls Sustained low-speed presence within a port boundary
EEZ crossings Transitions between exclusive economic zones
Encounters Two vessels within 500m for a sustained duration
Loitering Sustained low-speed movement in a small area

Each event includes a deterministic event_id, confidence score, timestamps, and full provenance linking back to source positions. See HEURISTICS.md for detection assumptions and known limitations.

CLI

Neptune includes a full command-line interface (requires pip install neptune-ais[cli]):

# Download data
neptune download --source noaa --date 2024-06-15
neptune download --source noaa --source dma --start 2024-06-01 --end 2024-06-07

# Inspect what you have
neptune inventory
neptune inventory --dataset positions

# Quality reports
neptune qc --source noaa --date 2024-06-15

# SQL queries from the terminal
neptune sql "SELECT count(*) FROM positions WHERE source = 'noaa'"

# Source catalog
neptune sources
neptune sources --compare noaa dma gfw

# Event queries
neptune events --kind port_call --date 2024-06-15

# Health and provenance
neptune health
neptune provenance --date 2024-06-15

# Promote streaming data to canonical store
neptune promote --landing-dir /tmp/neptune_landing --source aisstream

Documentation

Full Sphinx documentation is planned. In the meantime:

Resource Description
examples/ Six narrative examples covering the full workflow
HEURISTICS.md Event detection assumptions, confidence limits, non-goals
RELEASING.md Release procedures and checklist
RC_CHECKLIST.md Release-candidate validation results

Examples

# Example Topics
1 Source Discovery (.py) Inspect sources, capabilities, filters
2 Archival Ingest (.py) Download, Polars queries, SQL, helpers
3 Multi-Source Fusion (.py) Merge strategies, fusion config
4 Event Detection (.py) Port calls, EEZ crossings, encounters
5 Streaming Pipeline (.py) Live feeds, sinks, promotion
6 External Plugin Custom adapter via entry point

Tip: Install notebook support with pip install neptune-ais[notebooks] to run the interactive examples.

Contributing

Contributions are welcome. To get started:

git clone https://github.com/yourorg/neptune-ais.git
cd neptune-ais
pip install -e ".[all,dev]"
pytest

The test suite includes 768 tests covering adapter certification, schema reproducibility, streaming soak tests, and packaging validation.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neptune_ais-0.1.0.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neptune_ais-0.1.0-py3-none-any.whl (132.9 kB view details)

Uploaded Python 3

File details

Details for the file neptune_ais-0.1.0.tar.gz.

File metadata

  • Download URL: neptune_ais-0.1.0.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for neptune_ais-0.1.0.tar.gz
Algorithm Hash digest
SHA256 efe3afadfe81bba8e99384a3b3c1a9333f22fc3b02869699772c8b08d9508645
MD5 04047d00ef5c702670e94be54c43a058
BLAKE2b-256 56c19d11768c239c9463883b134a3566def0ba03fda26073a2cb8fa95004febb

See more details on using hashes here.

File details

Details for the file neptune_ais-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: neptune_ais-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 132.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for neptune_ais-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0520c3178d1cb9f57a696cdd1281b19f1019d690305173c9553cabc45a8350e9
MD5 fab7cc528014c81856126a9586743785
BLAKE2b-256 988eaebf836d78ea57850c6d3d13489b06a817debefe0edd4cae7b5b64810305

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page