Skip to main content

A small package for all useful ML things

Project description

Kowalsky, analysis!

A simple package for handful ML things and more.

What's new? [v0.0.39]

  • add feature package with two types of analysis + support для остальных функций
    • Recursive Feature Elimination
    • Sequential Feature Selection
  • improve optimize:
    • EarlyStopping mechanism
    • optimization graph
    • multitasks with n_jobs=-1
  • add logs package

What's inside?

  1. analysis - method for evaluation of specified model with given dataframe. With export_test_set=True it exports ready for submission predictions.

  2. df - module for working with dataframe:

    • corr - sort all correlated features.
    • handle_outliers - fill or drop columns with outliers.
    • log_transform - transform columns with log function.
    • group_by_mean - make additional columns with aggregated mean
    • group_by_max - make additional columns with aggregated max
    • group_by_min - make additional columns with aggregated min
    • apply_with_progress - apply heavy function for each row of dataset.
    • scale - scale columns with Standard of MinMax scalers
  3. kaggle:

    • submit - make submit-file for kaggle based on sample
  4. logs:

    • profile_memory - logs all heavy variables
    • make_pretty_pyplot - makes pyplot look better :)
  5. optuna - handful methods for working with optuna:

    • optimize - optimize model with given dataframe
    • optimize_super_learner - optimize super learner configuration with given set of models and set of heads (meta_model)
  6. colab:

    • csv - read csv file located at Google Drive with specified id
    • path - get path to Google Drive file
  7. feature:

    • rfe_analysis - Recursive Feature Elimination analysis
    • sfs_analysis - Sequential Feature Selection analysis

What's next?

  • Use optuna for searching the best feature amount
  • Add file logger to track the progress in JupterLab

Example:

!pip install kowalsky --upgrade
from kowalsky.optuna import optimize
optimize('RFR',
         path='../input/project/feed.csv',
         scorer='acc',
         y_label='y_label',
         trials=3000)

Avaliable models:

Gradient Boosts

    'xgbR': XGBRegressor
    'xgbC': XGBClassifier
    'lgbR': LGBMRegressor
    'lgbC': LGBMClassifier

Trees

    'rfR': RandomForestRegressor
    'rfC': RandomForestClassifier
    'dtR': DecisionTreeRegressor
    'dtC': DecisionTreeClassifier
    'etR': ExtraTreeRegressor
    'etC': ExtraTreeClassifier

Ensemble

    'baggC': BaggingClassifier
    'baggR': BaggingRegressor
    'adaR': AdaBoostRegressor
    'adaC': AdaBoostClassifier
    'cbR': CatBoostRegressor
    'cbC': CatBoostClassifier

KNeighbors

    'knC': KNeighborsClassifier
    'knR': KNeighborsRegressor

SVM

    'svR': SVR
    'svC': SVC

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kowalsky-0.0.44.tar.gz (13.5 kB view details)

Uploaded Source

File details

Details for the file kowalsky-0.0.44.tar.gz.

File metadata

  • Download URL: kowalsky-0.0.44.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0.post20210125 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for kowalsky-0.0.44.tar.gz
Algorithm Hash digest
SHA256 1524db9f76707b064d3b1dc95528c007e95a1dc594decfed22a0f66f926e86de
MD5 1fae8dd8c3a2fefb0c4a4b9818a45c14
BLAKE2b-256 b35e0e9893e1f0c127553183a5e69812f94304c42f9b537e8736c0c68640b154

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page