vtreat is a pandas.DataFrame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner.
Project description
This is the Python version of the vtreat
data preparation system
(also available as an R
package).
vtreat
is a DataFrame
processor/conditioner that prepares
real-world data for supervised machine learning or predictive modeling
in a statistically sound manner.
vtreat
takes an input DataFrame
that has a specified column called "the outcome variable" (or "y")
that is the quantity to be predicted (and must not have missing
values). Other input columns are possible explanatory variables
(typically numeric or categorical/string-valued, these columns may
have missing values) that the user later wants to use to predict "y".
In practice such an input DataFrame
may not be immediately suitable
for machine learning procedures that often expect only numeric
explanatory variables, and may not tolerate missing values.
To solve this, vtreat
builds a transformed DataFrame
where all
explanatory variable columns have been transformed into a number of
numeric explanatory variable columns, without missing values. The
vtreat
implementation produces derived numeric columns that capture
most of the information relating the explanatory columns to the
specified "y" or dependent/outcome column through a number of numeric
transforms (indicator variables, impact codes, prevalence codes, and
more). This transformed DataFrame
is suitable for a wide range of
supervised learning methods from linear regression, through gradient
boosted machines.
The idea is: you can take a DataFrame
of messy real world data and
easily, faithfully, reliably, and repeatably prepare it for machine
learning using documented methods using vtreat
. Incorporating
vtreat
into your machine learning workflow lets you quickly work
with very diverse structured data.
Worked examples can be found here.
For more detail please see here: arXiv:1611.09477
stat.AP (the documentation describes the R
version,
however all of the examples can be found worked in Python
here).
vtreat
is available
as a Python
/Pandas
package,
and also as an R
package.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file vtreat-1.3.1.tar.gz
.
File metadata
- Download URL: vtreat-1.3.1.tar.gz
- Upload date:
- Size: 56.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00731fd27e8bd64a9c0d36762eed1c3826658e09d5ddab069bd5230ef6763d8d |
|
MD5 | 15d09e79f3282395aedbc7d6d8f1b519 |
|
BLAKE2b-256 | f65a29a8992f73d042df2b4982e79929d750f50ef822183898fbc78199b995e7 |
File details
Details for the file vtreat-1.3.1-py3-none-any.whl
.
File metadata
- Download URL: vtreat-1.3.1-py3-none-any.whl
- Upload date:
- Size: 33.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4fae622b1de94df6ff147404ce3df7b22656eed2922e117e817176f8ba821ee |
|
MD5 | 7cc5bfd3d505b2b9883afa3234a2f8f1 |
|
BLAKE2b-256 | 3bbf4a3c854a5990808d1f100ada7bd5b6753be82b0cc5ccbe7fa30a4fa247fa |