Skip to main content

Empirical Dependence Plot (EDP): a model-free EDA plot of a target's conditional mean/rate across a feature.

Project description

edp-tool — Empirical Dependence Plot (EDP)

A small exploratory-data-analysis tool that shows how a target variable behaves across grouped values of a feature. It is designed with categorical / binary targets in mind (it plots the observed class rate), but it also works for continuous targets (it plots the mean).

example

EDP vs. PDP — why the name changed

This tool was originally called a Partial Dependence Plot (PDP), but that name is technically inaccurate. A true PDP requires a fitted model and marginalizes its predictions over the other features.

The EDP uses no model and does no marginalization — it plots the observed conditional mean of the target given a single feature, straight from the data. That makes it closely related to the M-plot (marginal plot) from the ALE literature (Apley & Zhu, 2020). Calling it Empirical Dependence Plot reflects exactly what it computes.

The old pdp function still works as a deprecated alias (see below).

Install

Just copy edp_tool.py next to your notebook, or from Colab:

import os
url = 'https://raw.githubusercontent.com/attilalr/edp-tool/main/edp_tool.py'
if not os.path.isfile('edp_tool.py'):
    !wget -q {url}
from edp_tool import edp

Requirements: numpy, pandas, matplotlib (see requirements.txt).

Usage

from edp_tool import edp

# Binary target -> positive-class rate per bin, with a Wilson CI band
edp(df, ['age', 'income'], 'converted')

# Multiclass target -> one line per class
edp(df, ['petal length (cm)'], 'species')

# Continuous target -> mean per bin, with a standard-error band
edp(df, ['age'], 'price')

# Save figures instead of showing them
edp(df, features, 'target', writefolder='figs')

edp() returns a list of dicts (feature, fig, ax, and path when saved), so you can post-process or embed the figures.

Key parameters

Parameter Default Meaning
n 4 Number of bins for continuous features (upper bound)
kind "auto" "auto" / "continuous" / "categorical" treatment per feature
binning "quantile" "quantile" or "uniform" bin edges
ci "auto" "wilson", "sem", "auto", or None (Wilson for classes, SEM for regression)
max_categories 10 Numeric columns with ≤ this many distinct values are treated as categorical
show_bincount True Draw per-bin sample count on a secondary axis
show_baseline True Draw the global target mean/rate as a reference line
ylim_origin True Start the y-axis at 0
even_spaced_ticks False Place continuous bins at real midpoints
writefolder None Save PNGs to this folder instead of showing

Multiclass targets are handled natively — no manual one-hot encoding needed.

What's new in this version

  • Renamed to EDP (edp_tool.edp); pdp_tool.pdp kept as a deprecated alias.
  • Fixed the dead categorical branch (feature type is now detected correctly).
  • Fixed n leaking across features and the maximum value being dropped from the last bin.
  • Native multiclass support, Wilson confidence intervals for rates, optional baseline line and uniform binning.
  • Figures are returned and properly closed; validation raises real exceptions.

Development

pip install -r requirements.txt pytest
pytest

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edp_tool-0.2.0.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

edp_tool-0.2.0-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file edp_tool-0.2.0.tar.gz.

File metadata

  • Download URL: edp_tool-0.2.0.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for edp_tool-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b5b08bbcadacdfd5f9de348a8300d9b2f8c76bfba87fcaaadf9d48a53453a688
MD5 1847b33ca2b35f04970218b37c5ab9f9
BLAKE2b-256 eabb3187c7f91560fb7b00e8ac8536aee9c56ddca5958c98f903495a7ab197ef

See more details on using hashes here.

File details

Details for the file edp_tool-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: edp_tool-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for edp_tool-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd3c5ada0ed1bd7b831d41a12ea4f476fed6fcaf2c56b9b272fabeef7e53320a
MD5 bf42f55558f11e0d0046189671b51310
BLAKE2b-256 8b49103add8b52a8b25825df95b1b35822a7a72723af90c518a33124c6e13ddc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page