Skip to main content

Competition-oriented framework for interactive feature engineering and building reproducible pipelines

Project description

kts

kts was created to simplify and unify process of solving machine learning competitions.

Key Principles

  • Feature engineering is modular:
    • Each feature engineering function represents a block of features: takes an input dataframe and produces a new one with only new features
    • Such functions called feature constructors are then assebmled into feature sets which are then used for training models
    • Features are computed once, then they are loaded from cache
  • A pair of a model and a feature set is validated using a given metric and cross validation splitter
  • Once experiment is conducted, it is placed to your local leaderboard, trained models and sources are saved
  • Each experiment is given an ID, which is used to access it
  • Each experiment can produce predictions for any dataframe which has same columns as a training one: feature engineering is done automatically, then features are fed to trained models

Features

  • We support features which should be computed differently for training set and validation set: they are implemented as simply as usual ones using special syntax (df.train and df.encoders attributes of a dataframe passed to function)
  • User cache: you can store any objects to access them from other notebooks of your project with kts.save and kts.load
  • Standard library: some common feature generation techniques are preimplemented, like target or one hot encoding; you can use their sources to borrow best practices of writing custom feature constructors in kts style
  • Designed for multiple notebook environment: cache is synchronized between notebooks, e.g. you can change source of a feature constructor in one notebook and get it automatically changed in other one; same for objects
  • Stacking is as simple as kts.stack(IDs): it creates a standard feature constructor which can be used for feature set creation
  • Feature selection: select best features from an experiment using built-in feature importances calculator (sklearn-style) or permutation importance. You can also implement your own feature importance calculator using our base class

Getting started

Use $ pip3 install kts to install the latest version from PyPI.
Check kts-examples repo to learn basics.

Command line interface

Use it to create a new project:

$ mkdir project
$ cd project
$ kts init

or download an example from kts-examples repo:

$ kts example titanic

Contribution

Contact me in Telegram or ODS Slack to share any thoughts about the framework or examples. You're always welcome to propose new features or even implement them.

Acknowledgements

Core of the project was designed and implemented by the team of Mikhail Andronov, Roman Gorb and Nikita Konodyuk under the mentorship of Alexander Avdyushenko during a project practice held by Yandex and Higher School of Economics on 1-14 February 2019 at Educational Center «Sirius».

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kts-0.2.45.tar.gz (31.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kts-0.2.45-py3-none-any.whl (41.3 kB view details)

Uploaded Python 3

File details

Details for the file kts-0.2.45.tar.gz.

File metadata

  • Download URL: kts-0.2.45.tar.gz
  • Upload date:
  • Size: 31.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.24.0 CPython/3.6.1

File hashes

Hashes for kts-0.2.45.tar.gz
Algorithm Hash digest
SHA256 f19179ab5f8ff3c72a23a7698e297ef7d5533113d4ac82bd9a4f0ff4428cf19a
MD5 1a79ad4826c4b7223270fc90a18fe829
BLAKE2b-256 c65bda883a41c8bfafc3efb08c9196a675bd462c0d462a82478c38f87be1f779

See more details on using hashes here.

File details

Details for the file kts-0.2.45-py3-none-any.whl.

File metadata

  • Download URL: kts-0.2.45-py3-none-any.whl
  • Upload date:
  • Size: 41.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.18.4 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.24.0 CPython/3.6.1

File hashes

Hashes for kts-0.2.45-py3-none-any.whl
Algorithm Hash digest
SHA256 247cff1a9dc6aca380386fc6d1c77d4fa8f954317d0da51cbc9c01572430f3a5
MD5 9431417243647912d9ab28c7034af9ec
BLAKE2b-256 263b5c01de6a6e25acb2f365da34a4bd479eda6d218609bd15c648ee23435016

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page