More transformers for scikit-learn pipelines
Project description
more-transformers
My own list of "extra" transformers in scikit-learn pipelines.Intro
When building scikit-learn pipelines I often feel I have to do a lot of my data preparation work outside the pipeline. Moreover, many scikit-learn transformers could be more beginer friendly if they returned pandas DataFrames instead of numpy arrays.
With that in mind, this library includes a few additional transformers that are mostly thin wrappers around scikit-learn.
For example:
from more_transformers.preprocessing import StandardScaler
behaves identically to sklearn.preprocessing.StandardScaler
but returns a pandas DataFrame with the same column names and index values as the original.
As another example
from more_transformers.decomposition import PCA
is the same as from sklearn.decomposition import PCA
but retains the index and uses column names pca_0
, pca_1
,...,pca_n
.
I've also added my own few helpers, mostly under from more_transformers.common
. For example
from more_transformers.preprocessing import GetDummies
is a transformer version of pd.get_dummies. One advantage is that if the test data is transformed to have the same columns as pd.get_dummies on the training data.
Also note
from more_transformers.common import ColumnSelector
allows for very flexible selection of columns in your pipeline. For example
ColumnSelector() # Selects all columns
ColumnSelector(['Age','Weight','Height']) # Selects these columns
ColumnSelector('number') # Selects all integer or float columns
ColumnSelector(lambda x: str(x).starts_with('x_')) # Selects columns starting with 'x_'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file more-transformers-0.0.11.tar.gz
.
File metadata
- Download URL: more-transformers-0.0.11.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 146707db18833d0f191df5329219dd159a9c9e1a3dc829bc726a67aee78c947e |
|
MD5 | d33c4140f41a0358941b8958e1f48cd1 |
|
BLAKE2b-256 | 3bee302a74ebbffd479e341e9a8a6f29b110ce16dcfcd559fc9d7bf22cfaa59c |