Skip to main content

No project description provided

Project description

HISEL

Feature selection tool based on Hilbert-Schmidt Independence Criterion

Feature selection is the machine learning task of selecting from a data set the features that are relevant for the prediction of a given target. The hisel package provides feature selection methods based on Hilbert-Schmidt Independence Criterion. In particular, it provides an implementation of the HSIC Lasso algorithm of Yamada, M. et al. (2012).

Why is hisel cool?

hisel is accurate

HSIC Lasso is an excellent algorihtm for feature selection. This makes hisel an accurate tool in your machine learning modelling. Moreover, hisel implements clever routines that address common causes of poor accuracy in other feature selection methods.

Examples of where hisel outperforms the methods in sklearn.feature_selection are given in the notebooks ensemble-example.ipynb and nonlinear-transform.ipynb.

hisel is fast

A crucial step in the HSIC Lasso algorithm is the computation of certain Gram matrices. hisel implemets such computations in a highly vectorised and performant way. Moreover, hisel allows you to accelerate these computations using a GPU. The image below shows the average run time of the computations of Gram matrices via hisel on CPU, via hisel on GPU, and via pyHSICLasso. The performance has been measured on the computation of Gram matrices required by HSIC Lasso for the selection from a dataset of 300 features with as many samples as reported on the x-axis.

gramtimes

hisel has a friendly user interface

Getting started with hisel is as straightforward as the following code snippet:

    >>> import pandas as pd
    >>> import hisel
    >>> df = pd.read_csv('mydata.csv')
    >>> xdf = df.iloc[:, :-1]
    >>> yser = df.iloc[:, -1]
    >>> hisel.feature_selection.select_features(xdf, yser)
    ['d2', 'd7', 'c3', 'c10', 'c12', 'c24', 'c22', 'c21', 'c5']

If you are not interested in more details, please read no further. If you would like to explore more about how to tune the hyper-parameters used by hisel or how to have more advanced control on hisel's selection, please browse the examples in examples/ and in notebooks.

Installation

Install via pip

The package hisel is available from arti. You can install it via pip. While on the Wise-VPN, in the environment where you intende to sue hisel, just do

pip install hisel --index-url=https://arti.tw.ee/artifactory/api/pypi/pypi-virtual/simple

Install from source

Basic installation:

Checkout the repo and navigate to the root directory. Then,

poetry install

Installation with GPU support

You need to have cuda-toolkit installed and you need to know its version. To know that, you can do

nvidia-smi

and read the cuda version from the top right corner of the table that is printed out. Once you know your version of cuda, do

poetry install -E cudaXXX

where cudaXXX is one of the following: cuda102 if you have version 10.2; cuda110 if you have version 11.0; cuda111 if you have version 11.1; cuda11x if you have version 11.2 - 11.8; cuda12x if you have version 12.x. This aligns to the installation guide of CuPy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hisel-1.0.0.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

hisel-1.0.0-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file hisel-1.0.0.tar.gz.

File metadata

  • Download URL: hisel-1.0.0.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for hisel-1.0.0.tar.gz
Algorithm Hash digest
SHA256 8acd594368719198a326ae8eda2cb7d47eaa84b5a890002c4963c561a7fbb90a
MD5 aa6a14ffa1109edb46594b6bb2432ef0
BLAKE2b-256 b54ddf6a8449ce5267ed8061a037d1bab8488ab1d1c916923aadd3a1bc5ed6a6

See more details on using hashes here.

File details

Details for the file hisel-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: hisel-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for hisel-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5e510a57d34213238a1b9bd800d9e4d4d04c8e986be431a5f5a39d2b14262147
MD5 1df5e55c332d08ab67811f88c5926dec
BLAKE2b-256 5fc6fbf817b9867ea0f5440c8059b4f71c6357582bba1f5a1aa1dce194126fdd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page