A feature selection toolkit for ML practitioners: visualize relationships, rank feature importance, and understand feature contributions. Choose wisely.

These details have not been verified by PyPI

Project links

Project description

megumi

Megumi Icon

megumi is a Python package for feature selection in machine learning workflows.

Inspired by Jujutsu Kaisen's Megumi Fushiguro, this package helps you to work strategically: not about using every feature available, but about choosing the right ones.

What it does

Feature selection is one of the most impactful steps in building a machine learning model, and also one of the easiest to rush. megumi gives you the tools to do it properly, across three areas:

1. Visual exploration: Understand your features before modelling. See how each feature relates to the target, spot distributions that separate classes, catch missing value patterns, and identify multicollinearity, all in a single function call.

2. Importance scoring: Go beyond intuition. Use machine learning-based methods to quantify which features actually carry predictive power.

3. Contribution analysis: Quantify how much a set of new features improves your model. Test candidate features against your current ones across any metric, including custom business metrics, and get statistical significance out of the box.

Modules

`gyokuken` - Visual feature analysis

Named after Megumi's shikigami gyokuken (玉犬, the Divine Dogs), used for tracking and sensing. This module helps you track and sense the true nature of your features.

Function	Description
`plot_bivariate`	Mean target rate per feature bucket vs. observation counts. Adapts automatically to continuous and categorical features. Missing values get their own bucket.
`plot_distribution`	Feature distribution grouped by target. Supports histogram, KDE, violin, and boxplot.
`plot_correlation`	Lower triangle correlation heatmap. Supports Pearson, Spearman, and Kendall.
`plot_missing`	Horizontal bar chart of missing value percentages, sorted by severity.

Usage example:

from megumi.gyokuken import plot_bivariate, plot_distribution, plot_correlation, plot_missing

plot_bivariate(df, feature="age", target="default")
plot_distribution(df, feature="income", target="default", kind="violin")
plot_correlation(df)
plot_missing(df)

All visualisations adapt to the target type automatically: binary classification, multiclass classification, or regression.

`bansho` - SHAP-based feature importance scoring

Named after Megumi's shikigami Banshō (万象, Max Elephant) (a heavy, water releasing shikigami). This module uses machine learning and SHAP values to reveal which features carry real predictive power.

Two synthetic random features (RANDOM_1, RANDOM_2) are introduced as baselines before fitting a vanilla machine learning model. Every input feature is ranked by its mean absolute SHAP value and labelled in relation to those baselines:

Label	Meaning
`predictive`	Mean \|SHAP\| beats both random features - a genuinely informative feature.
`marginal`	Mean \|SHAP\| beats one random feature - weak signal, use with caution.
`noise`	Mean \|SHAP\| beats neither random feature - no detectable predictive power.

Function	Description
`score_features`	Fit a vanilla model, compute SHAP values, and return a ranked DataFrame of features labelled by predictive power.

Usage example:

from sklearn.model_selection import train_test_split
from megumi.bansho import score_features

df_train, df_val = train_test_split(df, test_size=0.2, random_state=42)

result = score_features(df_train, features=["age", "income", "zip"], target="default",
                        df_val=df_val, random_state=42)
# returns:
#      feature predictive_power
# 0     income       predictive
# 1        age         marginal
# 2        zip            noise

Passing df_val is recommended: the forest is fitted on the training set and SHAP values are computed on the held-out set, producing more conservative importance estimates. If omitted, SHAP is computed on the training set directly.

Supports binary classification and regression targets. The strategy parameter is reserved for future model types (e.g. "linear"); currently only "tree" (random forest) is available.

`nue` - Feature contribution analysis

Named after Megumi's shikigami Nue (鵺), a chimeric creature used to survey and strike from above. This module answers the question: "If I add these features, how much improvement do I get?"

Three random forests are fitted per cross-validation fold: one on the base features alone, one on base + candidate features (real values), and one on base + candidate features with the new columns row-permuted (the null model). Significance is measured by a paired t-test between the real augmented model and the null model, which isolates genuine signal from the Random Forest diversification effect that can inflate performance even when adding pure noise to a fixed max_features budget. Both sklearn metrics and udf callables are supported, including business metrics that depend on extra columns in the dataset.

Function	Description
`evaluate_contribution`	Compare model metrics before and after adding candidate features and report whether the improvement is statistically significant.

Built-in metrics: "roc_auc", "recall", "precision", "f1", "accuracy" (classification); "rmse", "mae", "r2" (regression).

Usage example:

from megumi.nue import evaluate_contribution

result = evaluate_contribution(
    df,
    base_features=["age", "income", "credit_score"],
    new_features=["vendor_A", "vendor_B"],
    target="default",
    metrics=["roc_auc", "recall"],
    random_state=42,
)
# returns:
#     metric  base_score  augmented_score   delta  pct_change  p_value  significant
# 0  roc_auc      0.7421           0.8103  0.0682        9.19     0.003         True
# 1   recall      0.6830           0.7512  0.0682        9.98     0.021         True

Custom metric callables that accept a third argument receive the full test-fold DataFrame, enabling business metrics such as expected loss:

def loss_avoided(y_true, y_pred_proba, df_fold, threshold=0.5):
    flagged = y_pred_proba >= threshold
    return df_fold.loc[y_true.astype(bool) & ~flagged, "loan_amount"].sum()

result = evaluate_contribution(
    df,
    base_features=["age", "income"],
    new_features=["vendor_score"],
    target="default",
    metrics=[loss_avoided],
    random_state=42,
)

Supports binary classification and regression targets.

Installation

pip install megumi

Or, to set up a development environment using conda:

conda env create -f environment.yml
conda activate megumi-dev

Status

megumi is under active development. All three modules are available: gyokuken for visual feature exploration, bansho for SHAP-based importance scoring, and nue for feature contribution analysis. Contributions and feedback are welcome.

"With this treasure, I summon..."

amazing feature selection?

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Mar 30, 2026

0.1.1

Mar 30, 2026

0.1.0

Mar 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

megumi-0.1.2.tar.gz (2.6 MB view details)

Uploaded Mar 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

megumi-0.1.2-py3-none-any.whl (24.4 kB view details)

Uploaded Mar 30, 2026 Python 3

File details

Details for the file megumi-0.1.2.tar.gz.

File metadata

Download URL: megumi-0.1.2.tar.gz
Upload date: Mar 30, 2026
Size: 2.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for megumi-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`25801d1630db66773dbd6c812d318c553ba7aead317807b2f13f3bd34ab925b1`
MD5	`4d39b371e2f03ff1b95fa1309ec95e9b`
BLAKE2b-256	`57595c5bc916181ab653f2624f705771c4026632c681518def0a8348f9824c31`

See more details on using hashes here.

File details

Details for the file megumi-0.1.2-py3-none-any.whl.

File metadata

Download URL: megumi-0.1.2-py3-none-any.whl
Upload date: Mar 30, 2026
Size: 24.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for megumi-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`52f1c521cb7061a47ac88ded189e969d92c65a9cbd83f34162a921b593490377`
MD5	`7b6ca55d57fa33f0e9f9d176520d1790`
BLAKE2b-256	`1788f58002e113722069606e325358989388e638a54105e066e5a321b687af73`

See more details on using hashes here.

megumi 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

megumi

What it does

Modules

`gyokuken` - Visual feature analysis

`bansho` - SHAP-based feature importance scoring

`nue` - Feature contribution analysis

Installation

Status

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

megumi 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

megumi

What it does

Modules

gyokuken - Visual feature analysis

bansho - SHAP-based feature importance scoring

nue - Feature contribution analysis

Installation

Status

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`gyokuken` - Visual feature analysis

`bansho` - SHAP-based feature importance scoring

`nue` - Feature contribution analysis