Reusable feature engineering utilities
Project description
featurely
Reusable feature engineering utilities for tabular machine learning with pandas and scikit-learn.
featurely provides function-based helpers for the screen-then-commit feature engineering loop: build candidate features, test whether they explain variance your current model misses, and keep only the winners.
- Pipeline evaluation: cross-validated stage-over-stage comparison with persisted, rerun-safe results and progressive box plots.
- Candidate screening: residual correlation scans with Benjamini-Hochberg false discovery rate correction, for individual features and grouped feature sets.
- Feature builders: outlier handling, monotonic transforms, geographic encodings (haversine distances, geohash cells, rotated coordinates), quantile bin aggregates, k-means cluster memberships, Gaussian kernel spatial smoothing, and polynomial expansion with PCA component selection.
- Diagnostics and EDA: distribution plots, pairwise correlation analysis, and variance inflation factors.
All helpers accept a pandas DataFrame, take explicit column names, and return transformed copies without mutating input.
Install
pip install featurely
Requires Python 3.10 or newer.
Quick start
import pandas as pd
import featurely as fl
df = pd.read_csv("my_data.csv")
target = "price"
features = [c for c in df.columns if c != target]
# Establish a baseline
results = fl.add_pipeline_step(None, "raw", df[features], df[target])
# Clean outliers and measure the effect
df_clean = fl.clip_outliers(df, features, threshold=2.25)
results = fl.add_pipeline_step(
results, "+ cleaned", df_clean[features], df_clean[target]
)
fl.plot_pipeline_steps(results, title="Effect of outlier clipping")
# Build candidates and screen them against baseline residuals
candidates = fl.compute_bin_aggregates(df_clean, "latitude", ["income"], n_bins=10)
scan = fl.run_candidate_scan(df_clean, candidates, target=target)
significant = fl.plot_candidate_scan(scan, title="Candidate scan")
keep = [name for name, is_sig in significant.items() if is_sig]
df_clean = pd.concat([df_clean, candidates[keep]], axis=1)
Documentation
Full API reference, getting-started guide, and a complete worked example on the California housing dataset:
- Documentation: gperdrizet.github.io/featurely
- Source and example notebooks: github.com/gperdrizet/featurely
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file featurely-0.1.1.tar.gz.
File metadata
- Download URL: featurely-0.1.1.tar.gz
- Upload date:
- Size: 27.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf0a9c32213a883f5735dfd324a19cf6d1e9425abb7af2950caf01fe72001e78
|
|
| MD5 |
9b30b5d5847c400d5496b965f0b1a8e6
|
|
| BLAKE2b-256 |
b6a1cb5659b68b37aa0365709559ed18218fd42f99ef70952783e4d2550e0171
|
Provenance
The following attestation bundles were made for featurely-0.1.1.tar.gz:
Publisher:
publish.yml on gperdrizet/featurely
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
featurely-0.1.1.tar.gz -
Subject digest:
cf0a9c32213a883f5735dfd324a19cf6d1e9425abb7af2950caf01fe72001e78 - Sigstore transparency entry: 2051829876
- Sigstore integration time:
-
Permalink:
gperdrizet/featurely@efc9f8b0d5e85766bc682cfe4abff1ce7693eb9b -
Branch / Tag:
refs/heads/main - Owner: https://github.com/gperdrizet
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@efc9f8b0d5e85766bc682cfe4abff1ce7693eb9b -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file featurely-0.1.1-py3-none-any.whl.
File metadata
- Download URL: featurely-0.1.1-py3-none-any.whl
- Upload date:
- Size: 26.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
857cbb1dc1176b5c857a78b9b7a3988be71a864dc403c1f4f24ce9be21da14e4
|
|
| MD5 |
aca2687c7e3b4ab14f201a6b4cbc8a65
|
|
| BLAKE2b-256 |
57568b257b1ba149c753b61f2e1aa3d28eeee1b82100ef9bc6d652fec62dc089
|
Provenance
The following attestation bundles were made for featurely-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on gperdrizet/featurely
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
featurely-0.1.1-py3-none-any.whl -
Subject digest:
857cbb1dc1176b5c857a78b9b7a3988be71a864dc403c1f4f24ce9be21da14e4 - Sigstore transparency entry: 2051830013
- Sigstore integration time:
-
Permalink:
gperdrizet/featurely@efc9f8b0d5e85766bc682cfe4abff1ce7693eb9b -
Branch / Tag:
refs/heads/main - Owner: https://github.com/gperdrizet
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@efc9f8b0d5e85766bc682cfe4abff1ce7693eb9b -
Trigger Event:
workflow_dispatch
-
Statement type: