Skip to main content

Timeseries DataFrame downsampling with LTTB, aggregation methods, gap handling, and fidelity testing

Project description

downsampler

PyPI Python License

A Python package for time series DataFrame downsampling with LTTB, M4, multiple aggregation methods, gap handling, and fidelity testing.

Features

  • Multiple downsampling methods:
    • LTTB (visual fidelity)
    • M4 (guaranteed extrema preservation)
    • Traditional aggregations (mean, median, min, max)
  • Gap-aware processing: Automatically detects and handles gaps in time series
  • Edge handling: Flag, discard, or keep edge points
  • Multi-aggregate output: Generate min/mean/max columns in a single call
  • Range-based downsampling: Fetch data from external sources with automatic edge buffering
  • Multi-resolution pyramid: Generate downsampled versions at multiple cadences in one call
  • Fidelity testing: Compare methods and measure visual accuracy

Installation

pip install downsampler

Quick Start

Basic Downsampling

import pandas as pd
from downsampler import downsample_dataframe

# Create sample data
df = pd.DataFrame(
    {'temperature': range(1000)},
    index=pd.date_range('2024-01-01', periods=1000, freq='1s')
)

# Downsample to 1-minute cadence (default: mean)
result = downsample_dataframe(df, target_cadence='PT1M')

Using Different Methods

from downsampler import downsample_dataframe, DownsampleConfig, AggregationMethod

# Mean (default)
result = downsample_dataframe(df, '10min')

# Maximum
result = downsample_dataframe(df, '10min', method='max')

# LTTB for visual fidelity
config = DownsampleConfig(
    method=AggregationMethod.LTTB,
    lttb_target_column='temperature'
)
result = downsample_dataframe(df, '10min', config=config)

# M4 for guaranteed extrema preservation
result = downsample_dataframe(df, '10min', method='m4')

# M4 with collinearity filtering (reduces output size)
result = downsample_dataframe(df, '10min', method='m4', m4_collinearity_threshold=0.01)

Multi-Aggregate Downsampling

Create min/mean/max columns for visualization with error bands:

from downsampler import downsample_dataframe_multi_aggregate

result = downsample_dataframe_multi_aggregate(
    df,
    target_cadence='1min',
    variables=['temperature', 'pressure'],
    aggregations=['min', 'mean', 'max']
)
# Result has columns: temperature_min, temperature_mean, temperature_max, etc.

Multi-Resolution Pyramid

Generate downsampled versions at multiple cadences for storage:

from downsampler import downsample_dataframe_resolutions

results = downsample_dataframe_resolutions(
    df,
    cadences=['1min', '5min', '15min', '1h'],
)
# Returns {Timedelta('0 days 00:01:00'): DataFrame, ...}

for cadence, result_df in results.items():
    print(f"{cadence}: {len(result_df)} points")

M4 Downsampling (Extrema Preservation)

M4 guarantees exact preservation of minimum and maximum values, making it ideal for monitoring dashboards and alerting systems:

from downsampler import downsample_dataframe

# Basic M4 - preserves exact min/max
result = downsample_dataframe(df, '1min', method='m4')

# Verify extrema preservation
assert df['temperature'].min() == result['temperature'].min()
assert df['temperature'].max() == result['temperature'].max()

# M4 with deduplication (default, removes consecutive duplicates)
result = downsample_dataframe(df, '1min', method='m4', m4_deduplicate=True)

# M4 with collinearity filtering (reduces size on smooth data)
result = downsample_dataframe(df, '1min', method='m4', m4_collinearity_threshold=0.01)

M4 Features:

  • Selects up to 4 points per bucket: first, last, min, max
  • Guaranteed exact extrema preservation (no approximation)
  • Variable output size (typically 2-4x reduction vs 10x for traditional methods)
  • Deduplication: removes consecutive duplicates (20-50% reduction)
  • Collinearity filtering: removes min/max points near first-last line (0-75% reduction)
  • Superior peak detection compared to LTTB

When to use M4:

  • Monitoring dashboards where missing a spike could be critical
  • Alerting systems that need exact threshold crossings
  • Pre-computing multiple cadences with controllable size/fidelity trade-offs
  • Multi-variable sensor data where each variable's extrema matter

Handling Gaps

from downsampler import DownsampleConfig

config = DownsampleConfig(
    gap_threshold='5min'  # Gaps > 5 min trigger segmentation
)
result = downsample_dataframe(df, '1min', config=config)

Range-Based Downsampling

For data that needs to be fetched from an external source:

from downsampler import downsample_range

def fetch_from_api(start, end):
    # Your data fetching logic here
    return pd.DataFrame(...)

# Single fetch with automatic edge buffering
result = downsample_range(
    fetcher=fetch_from_api,
    output_start=pd.Timestamp('2024-01-01'),
    output_end=pd.Timestamp('2024-01-02'),
    target_cadence='1H'
)

# Batched mode for large ranges
result = downsample_range(
    fetcher=fetch_from_api,
    output_start=pd.Timestamp('2024-01-01'),
    output_end=pd.Timestamp('2024-02-01'),
    target_cadence='1H',
    batch_size='P1D'  # Process one day at a time
)

Fidelity Comparison

Compare different methods to find the best one for your data:

from downsampler.fidelity import FidelityComparison, summary_table

comp = FidelityComparison(original_df, 'signal')
results = comp.compare('10s', store_downsampled=True)

print(summary_table(results))
# See examples/fidelity_comparison.py (marimo notebook) for interactive visualization

Configuration Options

DownsampleConfig

Parameter Type Default Description
method AggregationMethod MEAN Downsampling method
lttb_target_column str None Column to optimize for LTTB
m4_deduplicate bool True For M4: remove consecutive duplicates
m4_collinearity_threshold float None For M4: filter collinear points (0.0-1.0)
include_columns list[str] [] Columns to include (empty = all)
exclude_columns list[str] [] Columns to exclude
gap_threshold str/Timedelta "auto" Min duration for gaps
edge_handling EdgeHandling KEEP How to handle edges
edge_window int 2 Points at each edge
min_points_per_segment int 3 Min points for processing
min_completeness float 0.9 Min fraction of expected points per bucket
source_cadence str/Timedelta None Source data cadence (estimated if None)

Aggregation Methods

  • MEAN: Arithmetic mean (best for general use)
  • MEDIAN: Median (robust to outliers)
  • MIN: Minimum value (preserves lows)
  • MAX: Maximum value (preserves highs)
  • LTTB: Largest Triangle Three Buckets (best visual fidelity)
  • M4: Min-Max-First-Last (guaranteed extrema preservation, best for monitoring/alerting)

Edge Handling

  • KEEP: Keep edge points as-is (default)
  • FLAG: Add _is_edge column
  • DISCARD: Remove edge points

Examples

See the examples/ directory for complete examples:

  • basic_downsampling.py: Core downsampling features
  • multi_aggregate.py: Creating min/mean/max columns
  • range_downsample.py: Range-based downsampling with automatic edge buffering
  • fidelity_comparison.py: Interactive fidelity comparison (marimo notebook)

Running the fidelity comparison notebook

Option 1 — Project install via uv (best for development):

uv run --extra dev marimo edit examples/fidelity_comparison.py

Option 2 — Marimo sandbox (self-contained, uses inline PEP 723 metadata):

marimo edit --sandbox examples/fidelity_comparison.py

API Reference

DataFrame-Mode Functions

downsample_dataframe(df, target_cadence, config=None, **kwargs) -> DataFrame
downsample_dataframe_multi_aggregate(df, target_cadence, variables, aggregations, ...) -> DataFrame
downsample_dataframe_resolutions(df, cadences, config=None, **kwargs) -> dict[Timedelta, DataFrame]

Range-Mode Functions

downsample_range(fetcher, output_start, output_end, target_cadence, config=None, batch_size=None, ...) -> DataFrame
downsample_range_multi_aggregate(fetcher, output_start, output_end, target_cadence, variables, ...) -> DataFrame
downsample_range_resolutions(fetcher, output_start, output_end, cadences, config=None, ...) -> dict[Timedelta, DataFrame]

Low-Level Functions

downsample_lttb(df, target_column, target_cadence, ...) -> DataFrame
downsample_m4(df, target_cadence, deduplicate=True, collinearity_threshold=None, ...) -> DataFrame
downsample_mean(df, target_cadence, ...) -> DataFrame
downsample_median(df, target_cadence, ...) -> DataFrame
downsample_min(df, target_cadence, ...) -> DataFrame
downsample_max(df, target_cadence, ...) -> DataFrame

Gap Functions

find_gap_indices(df, timedelta_max_gap) -> Series
groupby_gaps(df, timedelta_max_gap) -> DataFrameGroupBy
split_at_gaps(df, timedelta_max_gap) -> list[DataFrame]
mark_gaps_in_dataframe(df, nominal_timedelta, ...) -> DataFrame

Fidelity Functions

compute_metrics(original, downsampled, column) -> FidelityMetrics
FidelityComparison(original_df, column).compare(cadences, methods, ...) -> list[ComparisonResult]
summary_table(results) -> DataFrame

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

downsampler-0.2.0.tar.gz (39.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

downsampler-0.2.0-py3-none-any.whl (32.5 kB view details)

Uploaded Python 3

File details

Details for the file downsampler-0.2.0.tar.gz.

File metadata

  • Download URL: downsampler-0.2.0.tar.gz
  • Upload date:
  • Size: 39.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for downsampler-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ac235b45f68c83126cbbf7b7f0e97172f1cc3f86fa1f968e3bfc3a16ee51d917
MD5 f17a38e4f6da55057ebee3a695f927cf
BLAKE2b-256 ee7512bd46e01809b8c354d45fcbc31b4be84b3b7c9654f612f85fa52103a581

See more details on using hashes here.

File details

Details for the file downsampler-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: downsampler-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 32.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for downsampler-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ec8056e78a5885133b1b1ae10d0f93e5f60c7820a41846835719894e03feca7f
MD5 a1c71aad095ff0db27252ee63273b4a9
BLAKE2b-256 327891c22f23a0d359e515496136cc6b725a4db412794c669221b44455e3ac9e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page