No project description provided
Project description
HISEL
Feature selection tool based on Hilbert-Schmidt Independence Criterion
Feature selection is
the machine learning
task
of selecting from a data set
the features
that are relevant
for the prediction of a given target.
The hisel
package
provides feature selection methods
based on
Hilbert-Schmidt Independence Criterion.
In particular,
it provides an implementation of the HSIC Lasso algorithm of
Yamada, M. et al. (2012).
Why is hisel
cool?
hisel
is accurate
HSIC Lasso is an excellent algorihtm for feature selection.
This makes hisel
an accurate tool in your machine learning modelling.
Moreover,
hisel
implements clever routines
that address common causes of poor accuracy in other feature selection methods.
Examples of where hisel
outperforms the methods in
sklearn.feature_selection
are given in the notebooks
ensemble-example.ipynb
and
nonlinear-transform.ipynb
.
hisel
is fast
A crucial step in the HSIC Lasso algorithm
is the computation of
certain Gram matrices.
hisel
implemets such computations
in a highly vectorised and performant way.
Moreover,
hisel
allows you to
accelerate these computations
using a GPU.
The image below shows
the average run time
of the computations
of Gram matrices
via
hisel
on CPU,
via
hisel
on GPU,
and
via
pyHSICLasso.
The performance has been measured
on the computation
of Gram matrices required
by HSIC Lasso
for the selection
from a dataset of 300 features
with as many samples as reported on the x-axis.
hisel
has a friendly user interface
Getting started with hisel
is as straightforward as the following code snippet:
>>> import pandas as pd
>>> import hisel
>>> df = pd.read_csv('mydata.csv')
>>> xdf = df.iloc[:, :-1]
>>> yser = df.iloc[:, -1]
>>> hisel.feature_selection.select_features(xdf, yser)
['d2', 'd7', 'c3', 'c10', 'c12', 'c24', 'c22', 'c21', 'c5']
If you are not interested in more details,
please read no further.
If you would like to
explore more about
how to tune the hyper-parameters used by hisel
or
how to have more advanced control on hisel
's selection,
please browse the examples in
examples/
and in
notebooks.
Installation
Install via pip
The package hisel
is available from arti
. You can install it via pip
.
While on the Wise-VPN, in the environment where you intende to sue hisel
, just do
pip install hisel --index-url=https://arti.tw.ee/artifactory/api/pypi/pypi-virtual/simple
Install from source
Basic installation:
Checkout the repo and navigate to the root directory. Then,
poetry install
Installation with GPU support
You need to have cuda-toolkit installed and you need to know its version. To know that, you can do
nvidia-smi
and read the cuda version from the top right corner of the table that is printed out.
Once you know your version of cuda
, do
poetry install -E cudaXXX
where cudaXXX
is one of the following:
cuda102
if you have version 10.2;
cuda110
if you have version 11.0;
cuda111
if you have version 11.1;
cuda11x
if you have version 11.2 - 11.8;
cuda12x
if you have version 12.x.
This aligns to the installation guide of CuPy.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file hisel-1.0.0.tar.gz
.
File metadata
- Download URL: hisel-1.0.0.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8acd594368719198a326ae8eda2cb7d47eaa84b5a890002c4963c561a7fbb90a |
|
MD5 | aa6a14ffa1109edb46594b6bb2432ef0 |
|
BLAKE2b-256 | b54ddf6a8449ce5267ed8061a037d1bab8488ab1d1c916923aadd3a1bc5ed6a6 |
File details
Details for the file hisel-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: hisel-1.0.0-py3-none-any.whl
- Upload date:
- Size: 22.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e510a57d34213238a1b9bd800d9e4d4d04c8e986be431a5f5a39d2b14262147 |
|
MD5 | 1df5e55c332d08ab67811f88c5926dec |
|
BLAKE2b-256 | 5fc6fbf817b9867ea0f5440c8059b4f71c6357582bba1f5a1aa1dce194126fdd |