Skip to main content

More transformers for scikit-learn pipelines

Project description

more-transformers

Image of Pipes My own list of "extra" transformers in scikit-learn pipelines.

Intro

When building scikit-learn pipelines I often feel I have to do a lot of my data preparation work outside the pipeline. Moreover, many scikit-learn transformers could be more beginer friendly if they returned pandas DataFrames instead of numpy arrays.

With that in mind, this library includes a few additional transformers that are mostly thin wrappers around scikit-learn.

For example:

from more_transformers.preprocessing import StandardScaler

behaves identically to sklearn.preprocessing.StandardScaler but returns a pandas DataFrame with the same column names and index values as the original.

As another example

from more_transformers.decomposition import PCA

is the same as from sklearn.decomposition import PCA but retains the index and uses column names pca_0, pca_1,...,pca_n.

I've also added my own few helpers, mostly under from more_transformers.common. For example

from more_transformers.preprocessing import GetDummies

is a transformer version of pd.get_dummies. One advantage is that if the test data is transformed to have the same columns as pd.get_dummies on the training data.

Also note

from more_transformers.common import ColumnSelector

allows for very flexible selection of columns in your pipeline. For example

ColumnSelector() # Selects all columns
ColumnSelector(['Age','Weight','Height']) # Selects these columns
ColumnSelector('number') # Selects all integer or float columns
ColumnSelector(lambda x: str(x).starts_with('x_'))  # Selects columns starting with 'x_'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

more-transformers-0.0.11.tar.gz (5.2 kB view details)

Uploaded Source

File details

Details for the file more-transformers-0.0.11.tar.gz.

File metadata

  • Download URL: more-transformers-0.0.11.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.6.7

File hashes

Hashes for more-transformers-0.0.11.tar.gz
Algorithm Hash digest
SHA256 146707db18833d0f191df5329219dd159a9c9e1a3dc829bc726a67aee78c947e
MD5 d33c4140f41a0358941b8958e1f48cd1
BLAKE2b-256 3bee302a74ebbffd479e341e9a8a6f29b110ce16dcfcd559fc9d7bf22cfaa59c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page