Fast implementation of tsfresh time series feature extraction using Polars
Project description
polars-tsfresh
A high-performance Polars-based reimplementation of tsfresh for time series feature extraction.
polars-tsfresh extracts statistical features from time series data stored in Polars DataFrames. It's designed as a faster, more type-safe alternative to tsfresh, leveraging Polars' efficient columnar operations for better performance on grouped time series data.
Installation
With uv (Recommended)
uv add polars-tsfresh
With pip
pip install polars-tsfresh
Requirements
- Python 3.12+
- Polars >= 1.36.1
Quick Start
import polars as pl
from polars_tsfresh import extract_features
# Load your time series data
df = pl.read_csv("data.csv")
print(df.head())
# shape: (5, 5)
# ┌────────────┬─────────┬─────────┬─────────┬─────────┐
# │ date ┆ open ┆ high ┆ low ┆ close │
# │ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ str ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
# ╞════════════╪═════════╪═════════╪═════════╪═════════╡
# │ 2025/12/03 ┆ 6815.29 ┆ 6862.42 ┆ 6810.43 ┆ 6849.72 │
# │ 2025/12/04 ┆ 6866.47 ┆ 6866.47 ┆ 6827.12 ┆ 6857.12 │
# │ 2025/12/05 ┆ 6866.32 ┆ 6895.78 ┆ 6858.29 ┆ 6870.40 │
# │ ... ┆ ... ┆ ... ┆ ... ┆ ... │
# └────────────┴─────────┴─────────┴─────────┴─────────┘
Add an ID column to group by (here we're treating all data as one time series):
df = df.with_columns(pl.lit("sp500").alias("kind"))
Extract features:
features = extract_features(df, column_id="kind", column_sort="date")
print(features)
# shape: (1, 11)
# ┌────────────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────────┬────────────┐
# │ kind ┆ close__sum ┆ close__med ┆ close__mea ┆ close__len ┆ close__std ┆ close__var ┆ close__rms ┆ close__max ┆ close__abs ┆ close__min │
# │ --- ┆ _values ┆ ian ┆ n ┆ gth ┆ ┆ ┆ ┆ ┆ _maximum ┆ │
# │ str ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
# │ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
# ╞════════════╪════════════╪════════════╪════════════╪════════════╪════════════╪════════════╪════════════╪════════════╪════════════╪════════════╡
# │ sp500 ┆ 137124.56 ┆ 6853.42 ┆ 6856.228 ┆ 20.0 ┆ 51.503 ┆ 2652.59 ┆ 6856.421 ┆ 6932.05 ┆ 6932.05 ┆ 6721.43 │
# └────────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────┴────────────┘
The output shows features extracted for each group (in this case, one group "sp500") with features named using the column__feature convention from tsfresh.
Features
Current Minimal Feature Set
polars-tsfresh currently implements the "minimal" feature set from tsfresh, providing essential statistical measures:
| Feature | Description | tsfresh Equivalent |
|---|---|---|
| mean | Arithmetic mean of values | tsfresh.feature_extraction.mean |
| median | Median value | tsfresh.feature_extraction.median |
| variance | Variance (ddof=0) | tsfresh.feature_extraction.variance |
| standard_deviation | Standard deviation (ddof=0) | tsfresh.feature_extraction.standard_deviation |
| length | Number of data points | tsfresh.feature_extraction.length |
| maximum | Maximum value | tsfresh.feature_extraction.maximum |
| minimum | Minimum value | tsfresh.feature_extraction.minimum |
| absolute_maximum | Maximum absolute value | tsfresh.feature_extraction.absolute_maximum |
| root_mean_square | Root mean square (RMS) | tsfresh.feature_extraction.root_mean_square |
| sum_values | Sum of all values | tsfresh.feature_extraction.sum_values |
Performance Benefits
- Polars Expressions: Uses Polars' vectorized operations for optimal performance
- Grouped Operations: Efficiently handles multiple time series in a single DataFrame
- Type Safety: Full type hints and static analysis support
- Memory Efficient: Columnar operations reduce memory overhead
Roadmap
The project plans to implement comprehensive feature sets including:
- Distribution Features: skewness, kurtosis, quantiles, entropy
- Change & Rate Features: derivatives, trend analysis
- Position & Extrema Features: peak detection, location analysis
- Frequency Features: FFT coefficients, spectral analysis
- Autocorrelation Features: time series modeling
- Complexity Features: entropy measures, complexity estimates
See plan/tsfresh.yaml for the complete feature roadmap.
API Reference
extract_features(df, column_id, column_sort)
Extract features from a time series DataFrame.
Parameters:
df(pl.DataFrame): Input DataFrame containing time series datacolumn_id(str): Name of the column containing group IDscolumn_sort(str): Name of the column to sort by (typically time/date)
Returns:
pl.DataFrame: DataFrame with extracted features, one row per group
Example:
features = extract_features(
df=my_dataframe,
column_id="stock_symbol",
column_sort="timestamp"
)
Development
Setup
# Clone the repository
git clone https://github.com/lucidfrontier45/polars-tsfresh.git
cd polars-tsfresh
# Install with uv (includes dev dependencies)
uv sync
Testing
# Run all tests
uv run poe test
# Run specific test
uv run pytest tests/data/test_minimal.py::test_minimal
# Run with coverage
uv run pytest tests/ --cov=polars_tsfresh --cov-report=html
Code Quality
# Full quality check (linting + type checking)
uv run poe check
# Format code
uv run poe format
# Individual checks
uv run poe ruff_check # Linting
uv run poe pyrefly_check # Type checking
Project Structure
polars-tsfresh/
├── src/polars_tsfresh/
│ ├── __init__.py # Main API
│ └── features.py # Feature implementations
├── tests/
│ └── data/
│ ├── test_minimal.py # Test suite
│ ├── sp500_raw.csv # Test data
│ └── sp500_tsfresh_features.csv # Expected results
├── plan/
│ ├── tsfresh.yaml # Feature definitions
│ └── dev_plan.md # Development roadmap
└── pyproject.toml # Project configuration
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all checks pass:
uv run poe check && uv run poe test - Submit a pull request
See AGENTS.md for detailed development guidelines and coding standards.
Related Projects
- tsfresh: The original Python package for time series feature extraction
- Polars: Fast DataFrame library powering this implementation
License
Licensed under the Apache License 2.0. See LICENSE for details.
Authors
- 杜世橋 Du Shiqiao - Initial work - lucidfrontier.45@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polars_tsfresh-0.1.1.tar.gz.
File metadata
- Download URL: polars_tsfresh-0.1.1.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3815a0dbb9c9acf0ccfef2f2429fe01f3b6a1fed526829034c55d1eb2a7bcd9e
|
|
| MD5 |
bb580a3fe1c29e3406cb4210dd0531be
|
|
| BLAKE2b-256 |
c62d104c4f96da93a513e2112295b40c938df76404f7f0da69c4f4b367fd24e9
|
File details
Details for the file polars_tsfresh-0.1.1-py3-none-any.whl.
File metadata
- Download URL: polars_tsfresh-0.1.1-py3-none-any.whl
- Upload date:
- Size: 6.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cafff3e1287f89956a5cd943b0de04d99ffea59edd4b25aee8308c1b4652318d
|
|
| MD5 |
5ed2bdda7531b5e5d72cff5e1353649c
|
|
| BLAKE2b-256 |
446b270a165a7d3352d44f6cbcd25ff149fb184194d2b7b678bde68a3213adb1
|