Skip to main content

Imperio is a python sci-kit learn inspired package for feature engineering.

Project description

imperio

Imperio is a python sci-kit learn inspired package for feature engineering. It contains a some feature transformers to make your data more easy to learn from for Machine Learning Algorithms.

This version of imperio has the next methods of feature selection:

  1. Box-Cox (BoxCoxTransformer).
  2. Clusterize (ClusterizeTransformer).
  3. Combinator (CombinatorTransformer).
  4. Frequency Imputation Transformer (FrequencyImputationTransformer).
  5. log Transformer (LogTransformer).
  6. Smoothing (SmoothingTransformer).
  7. Spatial-Sign Transformer (SpatialSignTransformer).
  8. Target Imputation Transformer (TargetImputationTransformer).
  9. Whitening (WhiteningTransformer).
  10. Yeo-Johnson Transformer (YeoJohnsonTransformer).
  11. ZCA (ZCATransformer).

All these methods work like normal sklearn transformers. They have fit, transform and fit_transform functions implemented.

Additionally every imperio transformer has an apply function which allows to apply an transformation on a pandas Data Frame.

How to use imperio

To use a transformer from imperio you should just import the transformer from imperio in the following framework:

from imperio import BoxCoxTransformer

class names are written above in parantheses.

Next create a object of this algorithm (Box-Cox is used as an example).

method = BoxCoxTransformer()

Firstly you should fit the transformer, passing to it a feature matrix (X) and the target array (y). NOTE: y argument is really used only by the Target-Imputation.

method.fit(X, y)

After you fit the model, you can use it for transforming new data, using the transform function. To transform function you should pass only the feature matrix (X).

X_transformed = method.transform(X)

Also you can fit and transform the data at the same time using the fit_transform function.

X_transformed = method.fit_transform(X)

Also you can apply a transformation directly on a pandas DataFrame, choosing the columns that you want to change.

new_df = method.apply(df, 'target', ['col1', 'col2']

Some advice:

  1. Use FrequencyImputationTransformer or TargetImputationTransformer for categorical features.
  2. Use BoxCoxTransformer or YeoJohnsonTransformer for numerical features to normalize a feature distribution.
  3. Use SpatialSignTransformer on normalized data to bring outliers to normal samples.
  4. Use CombinatorTransformer on tombine different transformers on categorical and numerical columns separately.

With <3 from Sigmoid!

We are open for feedback. Please send your impressions to vladimir.stojoc@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imperio-0.1.5.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

imperio-0.1.5-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file imperio-0.1.5.tar.gz.

File metadata

  • Download URL: imperio-0.1.5.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for imperio-0.1.5.tar.gz
Algorithm Hash digest
SHA256 93a0686de50e1a1ff3fe25be66c18c40fbac56f41e17cc044e7b7398ba67e683
MD5 7d6c5d92476e32610121f53a8ce5c25d
BLAKE2b-256 44c7c6222d0ed638e9ca0a111fe1d23fc2325c922dc8cd8b93acc98ab99ee12a

See more details on using hashes here.

File details

Details for the file imperio-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: imperio-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 24.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.2

File hashes

Hashes for imperio-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3377a04d1e826fd0356564c3a6afc2fe79e0655e4e269ea29ca61e77cea7ac55
MD5 53c003ab9289e4b4d5a003d6fbc0248b
BLAKE2b-256 e6527ead831c549633ab45574bde68d3da489aaf8ad253f53d462281b7c5e50d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page