Skip to main content

A Python package for hydro anomaly detection with simple USGS data retrieval

Project description

Package Logo

HydroAnomaly

A Python package for water bodies anomaly detection. It retrieve the USGS water data and Sentinell 2 bands to use in the ML models for checking the quality of the data collected by the USGS gages.

PyPI version Downloads

Installation

Python script:

pip install hydroanomaly

For Jupyter

!pip install hydroanomaly

For uddating the package:

!pip install hydroanomaly --upgrade

USGS Data Retrieval

Easily retrieve real-time and historical turbidity of water from USGS Water Services:

import ee
import geemap
import hydroanomaly

# ------------------------
# User-defined settings: Example USGS site and date range (change to a site with turbidity data)
# ------------------------
site_number = "294643095035200"  # USGS site number
start_date = "2020-01-01"
end_date = "2024-12-30"

# ------------------------
# Data Extraction from USGS
# ------------------------
USGSdata, (lat, lon) = get_turbidity(site_number, start_date, end_date)
print("=" * 70)
print("Latitude:", lat)
print("Longitude:", lon)
print("=" * 70)
print(USGSdata.head())

Sentinel-2 Data Retrieval

Retrieve Sentinel data from the Google Search Engine

Defining the API of Google Earth Engine

ee.Authenticate()
ee.Initialize(project='XXXXXX-XXXX-XXXXXX-XX') # Replace with your own project ID number

Defining settings, coordinates, masks, etc:

Defining the area that you want to retrieve data:

latitude = 29.7785861
longitude = -95.0644278
bands = ['B2','B3','B4','B5','B6','B7','B8','B8A','B9','B11','B12', 'SCL']
buffer_meters = 20
cloudy_pixel_percentage = 20
masks_to_apply = [
    "water",
    "no_cloud_shadow",
    "no_clouds",
    "no_snow_ice",
    "no_saturated"
]

Sentinel-2 data retrieval:

from hydroanomaly import get_sentinel_bands

df = get_sentinel_bands(
    latitude=latitude,
    longitude=longitude,
    start_date=start_date,
    end_date=end_date,
    bands=bands,
    buffer_meters=buffer_meters,
    cloudy_pixel_percentage=cloudy_pixel_percentage,
    masks_to_apply=masks_to_apply
)

print(df.head())
print("=" * 70)
print(f"Retrieved {len(df)} rows")

Visualizing the map:

from hydroanomaly import show_sentinel_ndwi_map

Map = show_sentinel_ndwi_map(
    latitude, longitude, start_date, end_date,
    buffer_meters=buffer_meters, cloudy_pixel_percentage=cloudy_pixel_percentage, zoom=15)
Map

Time Series Plotting

Create visualizations of your water data:

from hydroanomaly.visualize import plot_timeseries
# For USGS data
plot_timeseries(USGSdata)
# For Sentinel data
plot_timeseries(df)
from hydroanomaly.visualize import plot_turbidity
# For USGS data
plot_turbidity(USGSdata)
from hydroanomaly.visualize import plot_sentinel
# For Sentinel data
plot_sentinel(df)
from hydroanomaly.visualize import plot_comparison
plot_comparison(USGSdata, df[['B6']], label1="Turbidity", label2="Sentinel-2 B6", title="Comparison: Turbidity vs Band 6")
from hydroanomaly import plot
# For Sentinel data
plot(df)
from hydroanomaly import visualize
# For USGS data
visualize(USGSdata)

NDVI:

import matplotlib.pyplot as plt
# Check available columns
print(df.columns)
print("=" * 70)
# Calculate NDVI if bands are available
if {'B4', 'B8'}.issubset(df.columns):
    df['NDVI'] = (df['B8'] - df['B4']) / (df['B8'] + df['B4'])
    df['NDVI'].plot(marker='o')
    plt.title("NDVI Time Series")
    plt.xlabel("Date")
    plt.ylabel("NDVI")
    plt.grid()
    plt.show()
else:
    print("NDVI bands (B4, B8) not found. Try plotting individual bands:")
    df[['B2', 'B3', 'B4', 'B8']].plot()
    plt.title("Sentinel-2 Reflectance (selected bands)")
    plt.ylabel("Reflectance")
    plt.xlabel("Date")
    plt.grid()
    plt.show()

Machine Learning for Anomaly Detection of the USGS data

print(df.columns)
print("=" * 70)
display(df.head(2))
print("=" * 70)
print(USGSdata.columns)
print("=" * 70)
display(USGSdata.head(2))
USGSdata = USGSdata[~USGSdata.index.duplicated(keep='first')]
print(df.index.duplicated().sum())                     # Number of duplicate datetimes in df
print(USGSdata.index.duplicated().sum())               # Number of duplicate datetimes in USGSdata

OneClassSVM

from hydroanomaly.ml import run_oneclass_svm
df_out, params, f1 = run_oneclass_svm(df, USGSdata)
# F1 Score
print(f"F1: {f1:.3f}")

Isolation Forest

from hydroanomaly.ml import run_isolation_forest
df_out_if, params_if, f1_if = run_isolation_forest(df, USGSdata)
print(f"F1: {f1_if:.3f}")

Features

  • USGS & Sentinel-2 Data Retrieval

    • Download real-time and historical water data from USGS Water Services (any site, any parameter)
    • Retrieve Sentinel-2 satellite bands using Google Earth Engine for any location and time range
    • Automatic data cleaning, validation, and alignment between ground (USGS) and satellite (Sentinel) data
    • Synthetic data generation fallback for testing
    • Convenient CSV export functionality
  • Time Series & Satellite Visualization

    • Quick plotting for single or multiple water quality parameters
    • Multi-parameter and multi-site comparison plots
    • Satellite band and index visualization (NDVI, NDWI, etc.)
    • Statistical analysis plots (histograms, box plots, trendlines)
    • High-quality plot export (PNG, PDF, SVG) with auto legends and formatting
  • Machine Learning & Anomaly Detection

    • Built-in anomaly detection using One-Class SVM and Isolation Forest models
    • Visual comparison of predicted vs. true anomalies in time series data
    • Feature engineering for satellite and in-situ sensor data
    • Easy integration with Pandas workflows
  • Powerful Data Analysis Tools

    • Mathematical operations and filtering for hydrologic data
    • Statistical summaries, validation, and automated quality checks
    • Utilities for matching, joining, and synchronizing time series
  • Easy to Use

    • Simple, Pythonic API for rapid data exploration and analysis
    • One-liner data retrieval and plotting functions
    • Comprehensive error handling
    • Well-documented with step-by-step examples and tutorials

Find USGS site numbers at: https://waterdata.usgs.gov/nwis


Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.


HydroAnomaly - Making water data analysis simple and beautiful!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hydroanomaly-1.2.6.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hydroanomaly-1.2.6-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file hydroanomaly-1.2.6.tar.gz.

File metadata

  • Download URL: hydroanomaly-1.2.6.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for hydroanomaly-1.2.6.tar.gz
Algorithm Hash digest
SHA256 7cea2de9750cbed2ea0f7acc5544baa2873632dbc48173eae71343d4fd0cc1b8
MD5 1b87a0f3d8e5a1eea0f159d122b67212
BLAKE2b-256 6f9c7eb88c3e9030191719cc9e7b1a95ffcb4a481c3ca7759b8c215675a60c93

See more details on using hashes here.

File details

Details for the file hydroanomaly-1.2.6-py3-none-any.whl.

File metadata

  • Download URL: hydroanomaly-1.2.6-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for hydroanomaly-1.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 831618ca8ce697e83999d033c0c1ab129c88e209fb499d195af8df7eec1cc659
MD5 47ef4ef8f68f3462920fe447da0d61a7
BLAKE2b-256 fc51b55729e460a71daa049a576beba94130a23f4dc8a6a2d27ec76622dd64c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page