Skip to main content

An ecosystem for machine learning project

Project description

This is a machine learning tools with pipeline. You can use it for binary classification or regression task. Here some example to run machine learning project:

from ngocbienml import MyPipeline

pipeline = MyPipeline()
pipeline.fit(data, target)
pipeline.score(new_data, new_target)

Or

from ngocbienml import PipelineKfold

pipline = PipelineKfold()
pipline.fit(data, target)
pipline.score(new_data, new_target)

Some params for MyPipeline, PipelineKfold:

  • objective be binary or regression
  • model_name be lgb or logistic to use lgbm model or linear sklearn model
  • model if set None, will be use model_name to pick model, otherwise, use direct this defined model. In case where model params is not None model_name will be ignored.

Where data is pandas dataframe, target is series object. In the above default settting, principal modules of pipeline are:

  • Fillna by mean
  • LabelEncoder
  • Feature Selection: Use 2 methods: variance and correlation
  • MinMaxScale
  • LGBClassifier: The default params work well with dataset of 100K rows or more, with minimum of 20 features. It deals well with unbalanced dataset. In the above default setting 10% of dataset will be cut for test set if not using kfold or 5 folds in other case.

You can use to save and reload pipeline for a long usage.

from joblib import dump, load
from ngocbienml import MyPipeline

pipeline = MyPipeline()
dump(pipeline, path)
pipeline = load(path)
pipeline.score(data, target)

You can use include many preprocessing classes like Fillna, Scale, or Labelencoder in your customized pipeline. Note that actually, you can not use full label encoder by sklearn

from ngocbienml import MinMaxScale, Fillna, LabelEncoder, ModelWithPipeline
from sklearn.pipeline import Pipeline

pipeline = Pipeline([('label_encoder', LabelEncoder()),
                     ('fillna', Fillna()),
                     ('scale', MinMaxScale()),
                     ('model', ModelWithPipeline())])

pipeline.fit(data, target)
pipeline.score(test, y_test)
from ngocbienml import PipelineKfold

pipeline = PipelineKfold()
pipeline.fit(data, target)
pipeline.score(test, y_test)

We can use only pipeline to transform data, and then use it for other task

from ngocbienml import AssertGoodHeader, Fillna, LabelEncoder, FeatureSelection, FillnaAndDropCatFeat, MinMaxScale
from sklearn.pipeline import Pipeline

steps = [('assertGoodheader', AssertGoodHeader()),
         ('Fillna', Fillna()),
         ('LabelEncoder', LabelEncoder()),
         ('FillnaAndDropCatFeat', FillnaAndDropCatFeat()),
         ('MinMaxScale', MinMaxScale()),
         ('FeatureSelection', FeatureSelection())]

pipline = Pipeline(steps=steps)
data = "numpy array or pd.DataFrame"
df_transformed = pipline.fit_transform(data)

Or the simplest way is to use the default params

from ngocbienml import MyPipeline

pipline = MyPipeline(model_name=None)  # do not use model in this case
data = "numpy array or pd.DataFrame"
df_transformed = pipline.fit_transform(data)

In the above code, df_tranformed is numeric data frame with the same header of df. df_transformed is ready to train by any model.

from sklearn.linear_model import LogisticRegression

classifier = LogisticRegression()
classifier.fit(df_transformed, label)

You can run deeplearning, for example

from ngocbienml import DeepLearningModel

classifier = DeepLearningModel(model_name='dl',
                        objective='regression',
                        epochs=3000,
                        hidden_layers=[64, 32],
                        activation=['relu', 'sigmoid'],
                        dropout=.1)

classifier.fit(data, target)

Note that all above model can run binary classification or regression. To setup for regression, please use objective params, for example

from ngocbienml import PipelineWithKfold
model = PipelineWithKfold(objective="regression", model_name="lgb") # this will create regression model
model.fit(data, target)

You can pass a specific model to the pipline

objective = "binary" # or regression
model = "some model with correct objective"
pipeline = MyPipeline(objective=objective, model=model)
pipeline.fit(data, target)

Use search cv for hyper params tuning:

from ngocbienml import SearchCv

SearchCv(n_iter=100).fit(data, target)

This tool will break down n_iter to small step and save at the and of these step, to ensure that you do not loss everything if you shut down your PC before the end of running. You can re-runing this to refit and fit the better params

What's next:

  • More setting in feature extraction and modelling.
  • More metric and visualization

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ngocbienml-2.0.0.tar.gz (29.4 kB view hashes)

Uploaded Source

Built Distribution

ngocbienml-2.0.0-py3-none-any.whl (31.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page