Competition-oriented framework for interactive feature engineering and building reproducible pipelines
kts was created to simplify and unify process of solving machine learning competitions.
- Feature engineering is modular:
- Each feature engineering function represents a block of features: takes an input dataframe and produces a new one with only new features
- Such functions called feature constructors are then assebmled into feature sets which are then used for training models
- Features are computed once, then they are loaded from cache
- A pair of a model and a feature set is validated using a given metric and cross validation splitter
- Once experiment is conducted, it is placed to your local leaderboard, trained models and sources are saved
- Each experiment is given an ID, which is used to access it
- Each experiment can produce predictions for any dataframe which has same columns as a training one: feature engineering is done automatically, then features are fed to trained models
- We support features which should be computed differently for training set and validation set: they are implemented as simply as usual ones using special syntax (
df.encodersattributes of a dataframe passed to function)
- User cache: you can store any objects to access them from other notebooks of your project with
- Standard library: some common feature generation techniques are preimplemented, like target or one hot encoding; you can use their sources to borrow best practices of writing custom feature constructors in kts style
- Designed for multiple notebook environment: cache is synchronized between notebooks, e.g. you can change source of a feature constructor in one notebook and get it automatically changed in other one; same for objects
- Stacking is as simple as
kts.stack(IDs): it creates a standard feature constructor which can be used for feature set creation
- Feature selection: select best features from an experiment using built-in feature importances calculator (sklearn-style) or permutation importance. You can also implement your own feature importance calculator using our base class
$ pip3 install kts to install the latest version from PyPI.
Check kts-examples repo to learn basics.
Command line interface
Use it to create a new project:
$ mkdir project $ cd project $ kts init
or download an example from kts-examples repo:
$ kts example titanic
Core of the project was designed and implemented by the team of Mikhail Andronov, Roman Gorb and Nikita Konodyuk under the mentorship of Alexander Avdyushenko during a project practice held by Yandex and Higher School of Economics on 1-14 February 2019 at Educational Center «Sirius».
Release history Release notifications
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size kts-0.2.42-py3-none-any.whl (41.2 kB)||File type Wheel||Python version py3||Upload date||Hashes View hashes|
|Filename, size kts-0.2.42.tar.gz (31.2 kB)||File type Source||Python version None||Upload date||Hashes View hashes|