Skip to main content

Data frame support and feature traceability for `scikit-learn`.

Project description

sklearndf is an open source library designed to address a common need with scikit-learn: the outputs of transformers are numpy arrays, even when the input is a data frame. However, to inspect a model it is essential to keep track of the feature names.

To this end, sklearndf enhances scikit-learn’s estimators as follows:

  • Preserve data frame structure:

    Return data frames as results of transformations, preserving feature names as the column index.

  • Feature name tracing:

    Add additional estimator properties to enable tracing a feature name back to its original input feature; this is especially useful for transformers that create new features (e.g., one-hot encode), and for pipelines that include such transformers.

  • Easy use:

    Simply append DF at the end of your usual scikit-learn class names to get enhanced data frame support!

pypi conda python_versions code_style made_with_sphinx_doc license_badge

License

sklearndf is licensed under Apache 2.0 as described in the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearndf-2.3.0.tar.gz (152.8 kB view hashes)

Uploaded Source

Built Distribution

sklearndf-2.3.0-py3-none-any.whl (73.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page