Automatic detection of second-order feature synergies for machine learning pipelines.
Project description
featsynergy
Automatic detection of second-order feature synergies for machine learning pipelines.
featsynergy helps you discover which pairs of features interact meaningfully — and automatically generates the derived features (products, ratios, squares) that capture those interactions.
Installation
pip install featsynergy
Quick Start
import pandas as pd
from featsynergy import SynergyDetector
detector = SynergyDetector(top_n=10, gain_thresh=0.002)
detector.fit(X_train, y_train)
# See which pairs have meaningful synergies
print(detector.top_pairs_)
# Add derived features to your DataFrame
X_train_enriched = detector.transform(X_train)
X_test_enriched = detector.transform(X_test)
How It Works
- Feature selection — selects the top-N most relevant features using a combined score of Pearson correlation and Ridge regression importance.
- Pair evaluation — for each pair of top features, evaluates a Ridge model with and without derived features (product, squares, safe division) using cross-validation.
- Synergy detection — pairs where the derived features meaningfully improve the score (gain > threshold) are flagged as synergistic.
- Transform — adds only the derived features from synergistic pairs to your DataFrame.
Parameters
| Parameter | Default | Description |
|---|---|---|
top_n |
10 | Number of candidate features to evaluate |
gain_thresh |
0.002 | Minimum gain to consider a pair synergistic |
cv |
3 | Cross-validation folds for pair evaluation |
task |
'regression' |
'regression' or 'classification' |
verbose |
True |
Print progress and results |
Attributes after fit()
| Attribute | Description |
|---|---|
top_features_ |
Selected top-N features |
results_ |
Full DataFrame with all evaluated pairs, sorted by gain |
top_pairs_ |
Pairs that exceed gain_thresh |
synergy_features_ |
Names of derived features added by transform() |
Safe Division
When a feature contains zeros, division is skipped for that direction to avoid infinity values. The reverse division (denominator with no zeros) is still computed normally.
sklearn Compatible
SynergyDetector implements the sklearn fit / transform / fit_transform interface and can be used in pipelines:
from sklearn.pipeline import Pipeline
from sklearn.ensemble import GradientBoostingRegressor
from featsynergy import SynergyDetector
pipe = Pipeline([
('synergy', SynergyDetector(top_n=10)),
('model', GradientBoostingRegressor()),
])
pipe.fit(X_train, y_train)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file featsynergy-0.1.0.tar.gz.
File metadata
- Download URL: featsynergy-0.1.0.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c06d3a0b6d10c9f8c9b18c20c3896b528db5e8894ab3a5d3ff6bc1cd1bbed79
|
|
| MD5 |
c109a6c7d14c576588bb28889df639e0
|
|
| BLAKE2b-256 |
473e802228030c9a6d8ecd01633159565c8a3c302111c5b04748143ecdff9696
|
File details
Details for the file featsynergy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: featsynergy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf175c6601f5218c1ad226faa014a0654a081626eaca27e11df477f42b08b159
|
|
| MD5 |
674365252e6bae47c5f4fe6099136c67
|
|
| BLAKE2b-256 |
3a10afc4e5f0214d3503205f2e87d6a8118ff3741626c623b3bc6e0ab039d379
|