DataFrame support and feature traceability for scikit-learn.
Project description
sklearndf is an open source library designed to address a common need with scikit-learn: the outputs of transformers are numpy arrays, even when the input is a data frame. However, to inspect a model it is essential to keep track of the feature names.
To this end, sklearndf enhances scikit-learn’s estimators as follows:
- Preserve data frame structure:
Return data frames as results of transformations, preserving feature names as the column index.
- Feature name tracing:
Add additional estimator properties to enable tracing a feature name back to its original input feature; this is especially useful for transformers that create new features (e.g., one-hot encode), and for pipelines that include such transformers.
- Easy use:
Simply append DF at the end of your usual scikit-learn class names to get enhanced data frame support!
License
sklearndf is licensed under Apache 2.0 as described in the LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for sklearndf-1.0.2rc0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33358d094dfd9420c73e5cfb127db8053dd8fe2d1ea568e4ed7f04e6ab5d5f41 |
|
MD5 | c24e9d7f0e5a3fc2bb82accc66a0ec60 |
|
BLAKE2b-256 | c0a213b8684ce4c183fbf6a712d1cf6c7fd3a111a9171b1b9c26716606a34de3 |