Skip to main content

Reusable feature engineering utilities

Project description

featurely

Reusable feature engineering utilities for tabular machine learning with pandas and scikit-learn.

featurely provides function-based helpers for the screen-then-commit feature engineering loop: build candidate features, test whether they explain variance your current model misses, and keep only the winners.

  • Pipeline evaluation: cross-validated stage-over-stage comparison with persisted, rerun-safe results and progressive box plots.
  • Candidate screening: residual correlation scans with Benjamini-Hochberg false discovery rate correction, for individual features and grouped feature sets.
  • Feature builders: outlier handling, monotonic transforms, geographic encodings (haversine distances, geohash cells, rotated coordinates), quantile bin aggregates, k-means cluster memberships, Gaussian kernel spatial smoothing, and polynomial expansion with PCA component selection.
  • Diagnostics and EDA: distribution plots, pairwise correlation analysis, and variance inflation factors.

All helpers accept a pandas DataFrame, take explicit column names, and return transformed copies without mutating input.

Install

pip install featurely

Requires Python 3.10 or newer.

Quick start

import pandas as pd
import featurely as fl

df = pd.read_csv("my_data.csv")
target = "price"
features = [c for c in df.columns if c != target]

# Establish a baseline
results = fl.add_pipeline_step(None, "raw", df[features], df[target])

# Clean outliers and measure the effect
df_clean = fl.clip_outliers(df, features, threshold=2.25)

results = fl.add_pipeline_step(
    results, "+ cleaned", df_clean[features], df_clean[target]
)

fl.plot_pipeline_steps(results, title="Effect of outlier clipping")

# Build candidates and screen them against baseline residuals
candidates = fl.compute_bin_aggregates(df_clean, "latitude", ["income"], n_bins=10)
scan = fl.run_candidate_scan(df_clean, candidates, target=target)
significant = fl.plot_candidate_scan(scan, title="Candidate scan")

keep = [name for name, is_sig in significant.items() if is_sig]
df_clean = pd.concat([df_clean, candidates[keep]], axis=1)

Documentation

Full API reference, getting-started guide, and a complete worked example on the California housing dataset:

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

featurely-0.1.1.tar.gz (27.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

featurely-0.1.1-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file featurely-0.1.1.tar.gz.

File metadata

  • Download URL: featurely-0.1.1.tar.gz
  • Upload date:
  • Size: 27.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for featurely-0.1.1.tar.gz
Algorithm Hash digest
SHA256 cf0a9c32213a883f5735dfd324a19cf6d1e9425abb7af2950caf01fe72001e78
MD5 9b30b5d5847c400d5496b965f0b1a8e6
BLAKE2b-256 b6a1cb5659b68b37aa0365709559ed18218fd42f99ef70952783e4d2550e0171

See more details on using hashes here.

Provenance

The following attestation bundles were made for featurely-0.1.1.tar.gz:

Publisher: publish.yml on gperdrizet/featurely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file featurely-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: featurely-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 26.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for featurely-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 857cbb1dc1176b5c857a78b9b7a3988be71a864dc403c1f4f24ce9be21da14e4
MD5 aca2687c7e3b4ab14f201a6b4cbc8a65
BLAKE2b-256 57568b257b1ba149c753b61f2e1aa3d28eeee1b82100ef9bc6d652fec62dc089

See more details on using hashes here.

Provenance

The following attestation bundles were made for featurely-0.1.1-py3-none-any.whl:

Publisher: publish.yml on gperdrizet/featurely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page