No project description provided
Project description
HISEL
Feature selection tool based on Hilbert-Schmidt Independence Criterion
Feature selection is
the machine learning
task
of selecting from a data set
the features
that are relevant
for the prediction of a given target.
The hisel package
provides feature selection methods
based on
Hilbert-Schmidt Independence Criterion.
In particular,
it provides an implementation of the HSIC Lasso algorithm of
Yamada, M. et al. (2012).
Why is hisel cool?
hisel is accurate
HSIC Lasso is an excellent algorihtm for feature selection.
This makes hisel an accurate tool in your machine learning modelling.
Moreover,
hisel implements clever routines
that address common causes of poor accuracy in other feature selection methods.
Examples of where hisel outperforms the methods in
sklearn.feature_selection
are given in the notebooks
ensemble-example.ipynb
and
nonlinear-transform.ipynb.
hisel is fast
A crucial step in the HSIC Lasso algorithm
is the computation of
certain Gram matrices.
hisel implemets such computations
in a highly vectorised and performant way.
Moreover,
hisel allows you to
accelerate these computations
using a GPU.
The image below shows
the average run time
of the computations
of Gram matrices
via
hisel on CPU,
via
hisel on GPU,
and
via
pyHSICLasso.
The performance has been measured
on the computation
of Gram matrices required
by HSIC Lasso
for the selection
from a dataset of 300 features
with as many samples as reported on the x-axis.
hisel has a friendly user interface
Getting started with hisel is as straightforward as the following code snippet:
>>> import pandas as pd
>>> import hisel
>>> df = pd.read_csv('mydata.csv')
>>> xdf = df.iloc[:, :-1]
>>> yser = df.iloc[:, -1]
>>> hisel.feature_selection.select_features(xdf, yser)
['d2', 'd7', 'c3', 'c10', 'c12', 'c24', 'c22', 'c21', 'c5']
If you are not interested in more details,
please read no further.
If you would like to
explore more about
how to tune the hyper-parameters used by hisel
or
how to have more advanced control on hisel's selection,
please browse the examples in
examples/
and in
notebooks.
Installation
Install via pip
The package hisel is available from arti. You can install it via pip.
While on the Wise-VPN, in the environment where you intende to sue hisel, just do
pip install hisel --index-url=https://arti.tw.ee/artifactory/api/pypi/pypi-virtual/simple
Install from source
Basic installation:
Checkout the repo and navigate to the root directory. Then,
poetry install
Installation with GPU support
You need to have cuda-toolkit installed and you need to know its version. To know that, you can do
nvidia-smi
and read the cuda version from the top right corner of the table that is printed out.
Once you know your version of cuda, do
poetry install -E cudaXXX
where cudaXXX is one of the following:
cuda102 if you have version 10.2;
cuda110 if you have version 11.0;
cuda111 if you have version 11.1;
cuda11x if you have version 11.2 - 11.8;
cuda12x if you have version 12.x.
This aligns to the installation guide of CuPy.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hisel-1.0.0.tar.gz.
File metadata
- Download URL: hisel-1.0.0.tar.gz
- Upload date:
- Size: 19.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8acd594368719198a326ae8eda2cb7d47eaa84b5a890002c4963c561a7fbb90a
|
|
| MD5 |
aa6a14ffa1109edb46594b6bb2432ef0
|
|
| BLAKE2b-256 |
b54ddf6a8449ce5267ed8061a037d1bab8488ab1d1c916923aadd3a1bc5ed6a6
|
File details
Details for the file hisel-1.0.0-py3-none-any.whl.
File metadata
- Download URL: hisel-1.0.0-py3-none-any.whl
- Upload date:
- Size: 22.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e510a57d34213238a1b9bd800d9e4d4d04c8e986be431a5f5a39d2b14262147
|
|
| MD5 |
1df5e55c332d08ab67811f88c5926dec
|
|
| BLAKE2b-256 |
5fc6fbf817b9867ea0f5440c8059b4f71c6357582bba1f5a1aa1dce194126fdd
|