Timeseries DataFrame downsampling with LTTB, aggregation methods, gap handling, and fidelity testing
Project description
downsampler
A Python package for time series DataFrame downsampling with LTTB, M4, multiple aggregation methods, gap handling, and fidelity testing.
Features
- Multiple downsampling methods:
- LTTB (visual fidelity)
- M4 (guaranteed extrema preservation)
- Traditional aggregations (mean, median, min, max)
- Gap-aware processing: Automatically detects and handles gaps in time series
- Edge handling: Flag, discard, or keep edge points
- Multi-aggregate output: Generate min/mean/max columns in a single call
- Range-based downsampling: Fetch data from external sources with automatic edge buffering
- Multi-resolution pyramid: Generate downsampled versions at multiple cadences in one call
- Fidelity testing: Compare methods and measure visual accuracy
Installation
pip install downsampler
Quick Start
Basic Downsampling
import pandas as pd
from downsampler import downsample_dataframe
# Create sample data
df = pd.DataFrame(
{'temperature': range(1000)},
index=pd.date_range('2024-01-01', periods=1000, freq='1s')
)
# Downsample to 1-minute cadence (default: mean)
result = downsample_dataframe(df, target_cadence='PT1M')
Using Different Methods
from downsampler import downsample_dataframe, DownsampleConfig, AggregationMethod
# Mean (default)
result = downsample_dataframe(df, '10min')
# Maximum
result = downsample_dataframe(df, '10min', method='max')
# LTTB for visual fidelity
config = DownsampleConfig(
method=AggregationMethod.LTTB,
lttb_target_column='temperature'
)
result = downsample_dataframe(df, '10min', config=config)
# M4 for guaranteed extrema preservation
result = downsample_dataframe(df, '10min', method='m4')
# M4 with collinearity filtering (reduces output size)
result = downsample_dataframe(df, '10min', method='m4', m4_collinearity_threshold=0.01)
Multi-Aggregate Downsampling
Create min/mean/max columns for visualization with error bands:
from downsampler import downsample_dataframe_multi_aggregate
result = downsample_dataframe_multi_aggregate(
df,
target_cadence='1min',
variables=['temperature', 'pressure'],
aggregations=['min', 'mean', 'max']
)
# Result has columns: temperature_min, temperature_mean, temperature_max, etc.
Multi-Resolution Pyramid
Generate downsampled versions at multiple cadences for storage:
from downsampler import downsample_dataframe_resolutions
results = downsample_dataframe_resolutions(
df,
cadences=['1min', '5min', '15min', '1h'],
)
# Returns {Timedelta('0 days 00:01:00'): DataFrame, ...}
for cadence, result_df in results.items():
print(f"{cadence}: {len(result_df)} points")
M4 Downsampling (Extrema Preservation)
M4 guarantees exact preservation of minimum and maximum values, making it ideal for monitoring dashboards and alerting systems:
from downsampler import downsample_dataframe
# Basic M4 - preserves exact min/max
result = downsample_dataframe(df, '1min', method='m4')
# Verify extrema preservation
assert df['temperature'].min() == result['temperature'].min()
assert df['temperature'].max() == result['temperature'].max()
# M4 with deduplication (default, removes consecutive duplicates)
result = downsample_dataframe(df, '1min', method='m4', m4_deduplicate=True)
# M4 with collinearity filtering (reduces size on smooth data)
result = downsample_dataframe(df, '1min', method='m4', m4_collinearity_threshold=0.01)
M4 Features:
- Selects up to 4 points per bucket: first, last, min, max
- Guaranteed exact extrema preservation (no approximation)
- Variable output size (typically 2-4x reduction vs 10x for traditional methods)
- Deduplication: removes consecutive duplicates (20-50% reduction)
- Collinearity filtering: removes min/max points near first-last line (0-75% reduction)
- Superior peak detection compared to LTTB
When to use M4:
- Monitoring dashboards where missing a spike could be critical
- Alerting systems that need exact threshold crossings
- Pre-computing multiple cadences with controllable size/fidelity trade-offs
- Multi-variable sensor data where each variable's extrema matter
Handling Gaps
from downsampler import DownsampleConfig
config = DownsampleConfig(
gap_threshold='5min' # Gaps > 5 min trigger segmentation
)
result = downsample_dataframe(df, '1min', config=config)
Range-Based Downsampling
For data that needs to be fetched from an external source:
from downsampler import downsample_range
def fetch_from_api(start, end):
# Your data fetching logic here
return pd.DataFrame(...)
# Single fetch with automatic edge buffering
result = downsample_range(
fetcher=fetch_from_api,
output_start=pd.Timestamp('2024-01-01'),
output_end=pd.Timestamp('2024-01-02'),
target_cadence='1H'
)
# Batched mode for large ranges
result = downsample_range(
fetcher=fetch_from_api,
output_start=pd.Timestamp('2024-01-01'),
output_end=pd.Timestamp('2024-02-01'),
target_cadence='1H',
batch_size='P1D' # Process one day at a time
)
Fidelity Comparison
Compare different methods to find the best one for your data:
from downsampler.fidelity import FidelityComparison, summary_table
comp = FidelityComparison(original_df, 'signal')
results = comp.compare('10s', store_downsampled=True)
print(summary_table(results))
# See examples/fidelity_comparison.py (marimo notebook) for interactive visualization
Configuration Options
DownsampleConfig
| Parameter | Type | Default | Description |
|---|---|---|---|
method |
AggregationMethod | MEAN | Downsampling method |
lttb_target_column |
str | None | Column to optimize for LTTB |
m4_deduplicate |
bool | True | For M4: remove consecutive duplicates |
m4_collinearity_threshold |
float | None | For M4: filter collinear points (0.0-1.0) |
include_columns |
list[str] | [] | Columns to include (empty = all) |
exclude_columns |
list[str] | [] | Columns to exclude |
gap_threshold |
str/Timedelta | "auto" | Min duration for gaps |
edge_handling |
EdgeHandling | KEEP | How to handle edges |
edge_window |
int | 2 | Points at each edge |
min_points_per_segment |
int | 3 | Min points for processing |
min_completeness |
float | 0.9 | Min fraction of expected points per bucket |
source_cadence |
str/Timedelta | None | Source data cadence (estimated if None) |
Aggregation Methods
MEAN: Arithmetic mean (best for general use)MEDIAN: Median (robust to outliers)MIN: Minimum value (preserves lows)MAX: Maximum value (preserves highs)LTTB: Largest Triangle Three Buckets (best visual fidelity)M4: Min-Max-First-Last (guaranteed extrema preservation, best for monitoring/alerting)
Edge Handling
KEEP: Keep edge points as-is (default)FLAG: Add_is_edgecolumnDISCARD: Remove edge points
Examples
See the examples/ directory for complete examples:
basic_downsampling.py: Core downsampling featuresmulti_aggregate.py: Creating min/mean/max columnsrange_downsample.py: Range-based downsampling with automatic edge bufferingfidelity_comparison.py: Interactive fidelity comparison (marimo notebook)
Running the fidelity comparison notebook
Option 1 — Project install via uv (best for development):
uv run --extra dev marimo edit examples/fidelity_comparison.py
Option 2 — Marimo sandbox (self-contained, uses inline PEP 723 metadata):
marimo edit --sandbox examples/fidelity_comparison.py
API Reference
DataFrame-Mode Functions
downsample_dataframe(df, target_cadence, config=None, **kwargs) -> DataFrame
downsample_dataframe_multi_aggregate(df, target_cadence, variables, aggregations, ...) -> DataFrame
downsample_dataframe_resolutions(df, cadences, config=None, **kwargs) -> dict[Timedelta, DataFrame]
Range-Mode Functions
downsample_range(fetcher, output_start, output_end, target_cadence, config=None, batch_size=None, ...) -> DataFrame
downsample_range_multi_aggregate(fetcher, output_start, output_end, target_cadence, variables, ...) -> DataFrame
downsample_range_resolutions(fetcher, output_start, output_end, cadences, config=None, ...) -> dict[Timedelta, DataFrame]
Low-Level Functions
downsample_lttb(df, target_column, target_cadence, ...) -> DataFrame
downsample_m4(df, target_cadence, deduplicate=True, collinearity_threshold=None, ...) -> DataFrame
downsample_mean(df, target_cadence, ...) -> DataFrame
downsample_median(df, target_cadence, ...) -> DataFrame
downsample_min(df, target_cadence, ...) -> DataFrame
downsample_max(df, target_cadence, ...) -> DataFrame
Gap Functions
find_gap_indices(df, timedelta_max_gap) -> Series
groupby_gaps(df, timedelta_max_gap) -> DataFrameGroupBy
split_at_gaps(df, timedelta_max_gap) -> list[DataFrame]
mark_gaps_in_dataframe(df, nominal_timedelta, ...) -> DataFrame
Fidelity Functions
compute_metrics(original, downsampled, column) -> FidelityMetrics
FidelityComparison(original_df, column).compare(cadences, methods, ...) -> list[ComparisonResult]
summary_table(results) -> DataFrame
License
MIT License - see LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit issues and pull requests.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file downsampler-0.2.0.tar.gz.
File metadata
- Download URL: downsampler-0.2.0.tar.gz
- Upload date:
- Size: 39.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac235b45f68c83126cbbf7b7f0e97172f1cc3f86fa1f968e3bfc3a16ee51d917
|
|
| MD5 |
f17a38e4f6da55057ebee3a695f927cf
|
|
| BLAKE2b-256 |
ee7512bd46e01809b8c354d45fcbc31b4be84b3b7c9654f612f85fa52103a581
|
File details
Details for the file downsampler-0.2.0-py3-none-any.whl.
File metadata
- Download URL: downsampler-0.2.0-py3-none-any.whl
- Upload date:
- Size: 32.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec8056e78a5885133b1b1ae10d0f93e5f60c7820a41846835719894e03feca7f
|
|
| MD5 |
a1c71aad095ff0db27252ee63273b4a9
|
|
| BLAKE2b-256 |
327891c22f23a0d359e515496136cc6b725a4db412794c669221b44455e3ac9e
|