vtreat is a pandas.DataFrame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner.
vtreat is a
DataFrame processor/conditioner that prepares
real-world data for supervised machine learning or predictive modeling
in a statistically sound manner.
vtreat takes an input
that has a specified column called "the outcome variable" (or "y")
that is the quantity to be predicted (and must not have missing
values). Other input columns are possible explanatory variables
(typically numeric or categorical/string-valued, these columns may
have missing values) that the user later wants to use to predict "y".
In practice such an input
DataFrame may not be immediately suitable
for machine learning procedures that often expect only numeric
explanatory variables, and may not tolerate missing values.
To solve this,
vtreat builds a transformed
DataFrame where all
explanatory variable columns have been transformed into a number of
numeric explanatory variable columns, without missing values. The
vtreat implementation produces derived numeric columns that capture
most of the information relating the explanatory columns to the
specified "y" or dependent/outcome column through a number of numeric
transforms (indicator variables, impact codes, prevalence codes, and
more). This transformed
DataFrame is suitable for a wide range of
supervised learning methods from linear regression, through gradient
The idea is: you can take a
DataFrame of messy real world data and
easily, faithfully, reliably, and repeatably prepare it for machine
learning using documented methods using
vtreat into your machine learning workflow lets you quickly work
with very diverse structured data.
Worked examples can be found here.
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.