Skip to main content

vtreat is a pandas.DataFrame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner.

Project description

This is the Python version of the vtreat data preparation system (also available as an R package).

vtreat is a DataFrame processor/conditioner that prepares real-world data for supervised machine learning or predictive modeling in a statistically sound manner.

vtreat takes an input DataFrame that has a specified column called "the outcome variable" (or "y") that is the quantity to be predicted (and must not have missing values). Other input columns are possible explanatory variables (typically numeric or categorical/string-valued, these columns may have missing values) that the user later wants to use to predict "y". In practice such an input DataFrame may not be immediately suitable for machine learning procedures that often expect only numeric explanatory variables, and may not tolerate missing values.

To solve this, vtreat builds a transformed DataFrame where all explanatory variable columns have been transformed into a number of numeric explanatory variable columns, without missing values. The vtreat implementation produces derived numeric columns that capture most of the information relating the explanatory columns to the specified "y" or dependent/outcome column through a number of numeric transforms (indicator variables, impact codes, prevalence codes, and more). This transformed DataFrame is suitable for a wide range of supervised learning methods from linear regression, through gradient boosted machines.

The idea is: you can take a DataFrame of messy real world data and easily, faithfully, reliably, and repeatably prepare it for machine learning using documented methods using vtreat. Incorporating vtreat into your machine learning workflow lets you quickly work with very diverse structured data.

Worked examples can be found here.

For more detail please see here: arXiv:1611.09477 stat.AP (the documentation describes the R version, however all of the examples can be found worked in Python here).

vtreat is available as a Python/Pandas package, and also as an R package.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vtreat-1.3.1.tar.gz (56.8 kB view details)

Uploaded Source

Built Distribution

vtreat-1.3.1-py3-none-any.whl (33.5 kB view details)

Uploaded Python 3

File details

Details for the file vtreat-1.3.1.tar.gz.

File metadata

  • Download URL: vtreat-1.3.1.tar.gz
  • Upload date:
  • Size: 56.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for vtreat-1.3.1.tar.gz
Algorithm Hash digest
SHA256 00731fd27e8bd64a9c0d36762eed1c3826658e09d5ddab069bd5230ef6763d8d
MD5 15d09e79f3282395aedbc7d6d8f1b519
BLAKE2b-256 f65a29a8992f73d042df2b4982e79929d750f50ef822183898fbc78199b995e7

See more details on using hashes here.

File details

Details for the file vtreat-1.3.1-py3-none-any.whl.

File metadata

  • Download URL: vtreat-1.3.1-py3-none-any.whl
  • Upload date:
  • Size: 33.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for vtreat-1.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e4fae622b1de94df6ff147404ce3df7b22656eed2922e117e817176f8ba821ee
MD5 7cc5bfd3d505b2b9883afa3234a2f8f1
BLAKE2b-256 3bbf4a3c854a5990808d1f100ada7bd5b6753be82b0cc5ccbe7fa30a4fa247fa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page