Skip to main content

A Python package for hydro anomaly detection with simple USGS data retrieval

Project description

Package Logo

HydroAnomaly

A Python package for water bodies anomaly detection. It retrieves USGS water data and Sentinel-2 bands to use in ML models for checking the quality of the data collected by USGS gages.

PyPI version Downloads

Installation

Python script:

pip install hydroanomaly

For Jupyter

!pip install hydroanomaly

For updating the package:

!pip install hydroanomaly --upgrade

USGS Data Retrieval

Easily retrieve real-time and historical turbidity of water from USGS Water Services:

import ee
import geemap
import hydroanomaly

# ------------------------
# User-defined settings: Example USGS site and date range (change to a site with turbidity data)
# ------------------------
site_number = "294643095035200"  # USGS site number
start_date = "2020-01-01"
end_date = "2024-12-30"

# ------------------------
# Data Extraction from USGS
# ------------------------
USGSdata, (lat, lon) = get_turbidity(site_number, start_date, end_date)
print("=" * 70)
print("Latitude:", lat)
print("Longitude:", lon)
print("=" * 70)
print(USGSdata.head())

Sentinel-2 Data Retrieval

Retrieve Sentinel data from Google Earth Engine

Defining the Google Earth Engine API

ee.Authenticate()
ee.Initialize(project='XXXXXX-XXXX-XXXXXX-XX') # Replace with your own project ID number

Defining settings, coordinates, masks, etc:

Defining the area from which you want to retrieve data:

latitude = 29.7785861
longitude = -95.0644278
bands = ['B2','B3','B4','B5','B6','B7','B8','B8A','B9','B11','B12', 'SCL']
buffer_meters = 20
cloudy_pixel_percentage = 20
masks_to_apply = [
    "water",
    "no_cloud_shadow",
    "no_clouds",
    "no_snow_ice",
    "no_saturated"
]

Sentinel-2 data retrieval:

from hydroanomaly import get_sentinel_bands

df = get_sentinel_bands(
    latitude=latitude,
    longitude=longitude,
    start_date=start_date,
    end_date=end_date,
    bands=bands,
    buffer_meters=buffer_meters,
    cloudy_pixel_percentage=cloudy_pixel_percentage,
    masks_to_apply=masks_to_apply
)

print(df.head())
print("=" * 70)
print(f"Retrieved {len(df)} rows")

Visualizing the map:

from hydroanomaly import show_sentinel_ndwi_map

Map = show_sentinel_ndwi_map(
    latitude, longitude, start_date, end_date,
    buffer_meters=buffer_meters, cloudy_pixel_percentage=cloudy_pixel_percentage, zoom=15)
Map

Time Series Plotting

Create visualizations of your water data:

from hydroanomaly.visualize import plot_timeseries
# For USGS data
plot_timeseries(USGSdata)
# For Sentinel data
plot_timeseries(df)
from hydroanomaly.visualize import plot_turbidity
# For USGS data
plot_turbidity(USGSdata)
from hydroanomaly.visualize import plot_sentinel
# For Sentinel data
plot_sentinel(df)
from hydroanomaly.visualize import plot_comparison
plot_comparison(USGSdata, df[['B6']], label1="Turbidity", label2="Sentinel-2 B6", title="Comparison: Turbidity vs Band 6")
from hydroanomaly import plot
# For Sentinel data
plot(df)
from hydroanomaly import visualize
# For USGS data
visualize(USGSdata)

NDVI:

import matplotlib.pyplot as plt
# Check available columns
print(df.columns)
print("=" * 70)
# Calculate NDVI if bands are available
if {'B4', 'B8'}.issubset(df.columns):
    df['NDVI'] = (df['B8'] - df['B4']) / (df['B8'] + df['B4'])
    df['NDVI'].plot(marker='o')
    plt.title("NDVI Time Series")
    plt.xlabel("Date")
    plt.ylabel("NDVI")
    plt.grid()
    plt.show()
else:
    print("NDVI bands (B4, B8) not found. Try plotting individual bands:")
    df[['B2', 'B3', 'B4', 'B8']].plot()
    plt.title("Sentinel-2 Reflectance (selected bands)")
    plt.ylabel("Reflectance")
    plt.xlabel("Date")
    plt.grid()
    plt.show()

Machine Learning for Anomaly Detection of USGS Data

print(df.columns)
print("=" * 70)
display(df.head(2))
print("=" * 70)
print(USGSdata.columns)
print("=" * 70)
display(USGSdata.head(2))
USGSdata = USGSdata[~USGSdata.index.duplicated(keep='first')]
print(df.index.duplicated().sum())                     # Number of duplicate datetimes in df
print(USGSdata.index.duplicated().sum())               # Number of duplicate datetimes in USGSdata

OneClassSVM

from hydroanomaly.ml import run_oneclass_svm
df_out, params, f1 = run_oneclass_svm(df, USGSdata)
# F1 Score
print(f"F1: {f1:.3f}")

Isolation Forest

from hydroanomaly.ml import run_isolation_forest
df_out_if, params_if, f1_if = run_isolation_forest(df, USGSdata)
print(f"F1: {f1_if:.3f}")

Features

  • USGS & Sentinel-2 Data Retrieval

    • Download real-time and historical water data from USGS Water Services (any site, any parameter)
    • Retrieve Sentinel-2 satellite bands using Google Earth Engine for any location and time range
    • Automatic data cleaning, validation, and alignment between ground (USGS) and satellite (Sentinel) data
    • Synthetic data generation fallback for testing
    • Convenient CSV export functionality
  • Time Series & Satellite Visualization

    • Quick plotting for single or multiple water quality parameters
    • Multi-parameter and multi-site comparison plots
    • Satellite band and index visualization (NDVI, NDWI, etc.)
    • Statistical analysis plots (histograms, box plots, trendlines)
    • High-quality plot export (PNG, PDF, SVG) with auto legends and formatting
  • Machine Learning & Anomaly Detection

    • Built-in anomaly detection using One-Class SVM and Isolation Forest models
    • Visual comparison of predicted vs. true anomalies in time series data
    • Feature engineering for satellite and in-situ sensor data
    • Easy integration with Pandas workflows
  • Powerful Data Analysis Tools

    • Mathematical operations and filtering for hydrologic data
    • Statistical summaries, validation, and automated quality checks
    • Utilities for matching, joining, and synchronizing time series
  • Easy to Use

    • Simple, Pythonic API for rapid data exploration and analysis
    • One-liner data retrieval and plotting functions
    • Comprehensive error handling
    • Well-documented with step-by-step examples and tutorials

Find USGS site numbers at: https://waterdata.usgs.gov/nwis


Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.


HydroAnomaly - Making water data analysis simple and beautiful!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hydroanomaly-1.2.7.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hydroanomaly-1.2.7-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file hydroanomaly-1.2.7.tar.gz.

File metadata

  • Download URL: hydroanomaly-1.2.7.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for hydroanomaly-1.2.7.tar.gz
Algorithm Hash digest
SHA256 0608b86f8c1eab19063e72dcf114374d6c15748d2ef08eb98d479040bfcc84d8
MD5 7af51c7109cdfb7c7ba218abd3e0dc5f
BLAKE2b-256 6d7893d98a3aa17c296fefaef027851ad49cc055cea77a95dbaacba462ce5703

See more details on using hashes here.

File details

Details for the file hydroanomaly-1.2.7-py3-none-any.whl.

File metadata

  • Download URL: hydroanomaly-1.2.7-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for hydroanomaly-1.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 26b09fb660e7febdab17251cbbe1d739ad0fc0fbcbf071343655a79836166b22
MD5 9da1365acb6582ce6ee40ba780113408
BLAKE2b-256 973f14362116f4ea7db04422e72516c4a2bae3c3810b51e5b962e138aaef44c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page