Skip to main content

Non-parametric trend analysis for unequally spaced time series with censored data

Project description

MannKS Logo

MannKS

(Mann-Kendall Sen)

Robust Trend Analysis in Python


📦 Installation

pip install -r requirements.txt
pip install -e .

Requirements: Python 3.7+, NumPy, Pandas, SciPy, Matplotlib


✨ What is MannKS?

MannKS (Mann-Kendall Sen) is a Python package for detecting trends in time series data using non-parametric methods. It's specifically designed for environmental monitoring, water quality analysis, and other fields where data is messy, irregular, or contains detection limits.

When to Use MannKS

Use this package when your data has:

  • Irregular sampling intervals (daily → monthly → quarterly)
  • Censored values (measurements like <5 or >100)
  • Seasonal patterns you need to account for
  • No normal distribution (non-parametric methods don't require it)
  • Small to moderate sample sizes (n < 5,000 recommended)

Don't use for highly autocorrelated data (test first) or if you need n > 46,340 observations.


🚀 Quick Start

import pandas as pd
from MannKS import prepare_censored_data, trend_test

# 1. Prepare data with censored values
# Converts strings like '<5' into a structured format
values = [10, 12, '<5', 14, 15, 18, 20, '<5', 25, 30]
dates = pd.date_range(start='2020-01-01', periods=len(values), freq='ME')
data = prepare_censored_data(values)

# 2. Run trend test
# slope_scaling converts slope from "per second" to "per year"
result = trend_test(
    x=data,
    t=dates,
    slope_scaling='year',
    x_unit='mg/L',
    plot_path='trend.png'
)

# 3. Interpret results
print(f"Trend: {result.classification}")
print(f"Slope: {result.slope:.2f} {result.slope_units}")
print(f"Confidence: {result.C:.2%}")

Output:

Trend: Highly Likely Increasing
Slope: 24.57 mg/L per year
Confidence: 98.47%

Trend Analysis Plot


🎯 Key Features

Core Functionality

  • Mann-Kendall Trend Test: Detect monotonic trends with statistical significance
  • Sen's Slope Estimator: Calculate trend magnitude with confidence intervals
  • Seasonal Analysis: Separate seasonal signals from long-term trends
  • Regional Aggregation: Combine results across multiple monitoring sites

Data Handling

  • Censored Data Support: Native handling of detection limits (<5, >100)
    • Three methods: Standard, LWP-compatible, Akritas-Theil-Sen (ATS)
    • Handles left-censored, right-censored, and mixed censoring
  • Unequal Spacing: Uses actual time differences (not just rank order)
  • Missing Data: Automatically handles NaN values and missing seasons
  • Temporal Aggregation: Multiple strategies for high-frequency data

Statistical Features

  • Continuous Confidence: Reports likelihood ("Highly Likely Increasing") not just p-values
  • Data Quality Checks: Automatic warnings for tied values, long runs, insufficient data
  • Robust Methods: ATS estimator for heavily censored data
  • Flexible Testing: Kendall's Tau-a or Tau-b, custom significance levels

📊 Example Use Cases

Seasonal Water Quality Trend

from MannKS import seasonal_trend_test, check_seasonality

# Check if seasonality exists
seasonality = check_seasonality(x=data, t=dates, period=12, season_type='month')
print(f"Seasonal pattern detected: {seasonality.is_seasonal}")

# Run seasonal trend test
result = seasonal_trend_test(
    x=data,
    t=dates,
    period=12,
    season_type='month',
    agg_method='robust_median',  # Aggregates multiple samples per month
    slope_scaling='year'
)

Regional Analysis Across Sites

from MannKS import regional_test

# Run trend tests for each site
site_results = []
for site in ['Site_A', 'Site_B', 'Site_C']:
    result = trend_test(x=site_data[site], t=dates)
    site_results.append({
        'site': site,
        's': result.s,
        'C': result.C
    })

# Aggregate regional trend
regional = regional_test(
    trend_results=pd.DataFrame(site_results),
    time_series_data=all_site_data,
    site_col='site'
)
print(f"Regional trend: {regional.DT}, confidence: {regional.CT:.2%}")

⚠️ Important Limitations

Sample Size

  • Recommended maximum: n = 5,000 (triggers memory warning)
  • Hard limit: n = 46,340 (prevents integer overflow)
  • For larger datasets, use regional_test() to aggregate multiple smaller sites

Statistical Assumptions

  • Independence: Data points must be serially independent
    • Autocorrelation violates this and causes spurious significance
    • Pre-test with ACF or use block bootstrap methods if autocorrelated
  • Monotonic trend: Cannot detect U-shaped or cyclical patterns
  • Homogeneous variance: Most powerful when variance is constant over time

📚 Documentation

Detailed Guides

Examples

The Examples folder contains step-by-step tutorials from basic to advanced usage.


🔬 Validation

Extensively validated against:

  • LWP-TRENDS R script (34 test cases, 99%+ agreement)
  • NADA2 R package (censored data methods)
  • Edge cases: missing data, tied values, all-censored data, insufficient samples

See validation/ for detailed comparison reports.


🙏 Acknowledgments

This package is heavily inspired by the excellent work of LandWaterPeople (LWP). The robust censored data handling and regional aggregation methods are based on their R scripts and methodologies.


📖 References

  1. Helsel, D.R. (2012). Statistics for Censored Environmental Data Using Minitab and R (2nd ed.). Wiley.
  2. Gilbert, R.O. (1987). Statistical Methods for Environmental Pollution Monitoring. Wiley.
  3. Hirsch, R.M., Slack, J.R., & Smith, R.A. (1982). Techniques of trend analysis for monthly water quality data. Water Resources Research, 18(1), 107-121.
  4. Mann, H.B. (1945). Nonparametric tests against trend. Econometrica, 13(3), 245-259.
  5. Sen, P.K. (1968). Estimates of the regression coefficient based on a particular kind of rank correlation. Journal of the American Statistical Association, 63(324), 1379-1389.
  6. Fraser, C., & Whitehead, A. L. (2022). Continuous measures of confidence in direction of environmental trends at site and other spatial scales. Environmental Challenges, 9, 100601.
  7. Fraser, C., Snelder, T., & Matthews, A. (2018). State and trends of river water quality in the Manawatu-Whanganui region. Report for Horizons Regional Council.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mannks-0.1.0.tar.gz (43.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mannks-0.1.0-py3-none-any.whl (54.4 kB view details)

Uploaded Python 3

File details

Details for the file mannks-0.1.0.tar.gz.

File metadata

  • Download URL: mannks-0.1.0.tar.gz
  • Upload date:
  • Size: 43.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.10.6 Linux/4.15.0-142-generic

File hashes

Hashes for mannks-0.1.0.tar.gz
Algorithm Hash digest
SHA256 97b9d79f45699a5a98d78a2a508b34f1ca9de3558bb8b244b3ff71fb9161af98
MD5 93f707e4c71f77030df5e138e05f600f
BLAKE2b-256 933147e77720e60174ef6e05a27a9e2654ff60771d07046474e3cda08a63a471

See more details on using hashes here.

File details

Details for the file mannks-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mannks-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 54.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.10.6 Linux/4.15.0-142-generic

File hashes

Hashes for mannks-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 73aa60b85ef41cbd74e96a4d9bb6ceea1f9a6bfd3afc603249555289f83402c3
MD5 bda04d6f88b6d5201325e660bfbc5893
BLAKE2b-256 a07618dfd1155e33e8784c49386b41d32d19d67b1b00b8e1fe5de18a0bdf569e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page