Quantum Feature Seletion (QFS) algorithm
Project description
seleqt
- Quantum Feature Selection
This package contains a minimal implementation of the Quantum Feature Selection (QFS) algorithm described in Mücke et al., 2023. Given a labeled dataset, this package provides methods to
- discretize the data
- compute redundancy and importance values based on mutual information
- generate feature selection QUBO instances for a given α
- find an α such that a FS QUBO has exactly k bits in its minimizing binary vector
Installation
This package can be installed directly via pip
from the PyPi repository:
pip3 install seleqt
Usage
Data and labels must be given as numpy
arrays:
The data x
must have shape (n, d)
, where n
is the number of samples/observations and d
is the number of features.
The labels y
must have shape (n,)
.
Both x
and y
must be discrete, i.e., their type must be a subtype of integer.
Optimally, their values should correspond to indices (0
,1
,...,B-1
) of a finite number of bins B
.
from seleqt import *
Discretizing
This package provides a method discretize
that takes real-valued data and discretizes it by computing bins and returning the corresponding bin indices.
Its parameters are
x
: Input databins
: Number of bins (default 10)method
: Method to compute bin edges, must be either"equal"
or"quantile"
. For"equal"
, the value range is divided intobins
equally sized bins. For"quantile"
, the b/bins
-quantiles for b in 0,...,bins
are computed and used as bin edges. Default is"equal"
.share_bins
: Boolean that indicates if the bins should be computed over all values together vs. all columns separately. Default isTrue
.
x_ = discretize(x, bins=20, method='quantile')
Redundancy and Importance
The methods redundancy()
and importance()
compute what their names suggest.
As inputs, redundancy()
expects just x
, and importance()
expects both x
and y
.
All inputs must be discrete.
The result of redundancy()
is a numpy
array of shape (d, d)
, and importance()
returns a shape (d,)
array.
red = redundancy(x_)
imp = importance(x_, y)
Create QUBO Instance
Given redundancy and importance, the method feature_selection_qubo()
creates a QUBO parameter matrix according to Eq. 18 in the paper.
It takes the following arguments:
redundancy
:numpy.ndarray
of shape(d, d)
containing pairwise redundancy between the featuresimportance
:numpy.ndarray
of shape(d,)
containing importance for each featurealpha
: float between 0 and 1; weighting of importance against redundancy. Corresponds to α in Eq. 18.importance_threshold
: non-negative float; minimal meaningful importance value. Corresponds to ε in Eq. 18.threshold_penalty
: non-negative float; penalty value swapped in for importance when it is belowimportance_threshold
. Corresponds to μ in Eq. 18. Default is the
The return value is a numpy.ndarray
of shape (d, d)
containing the QUBO parameters as an upper-triangular matrix.
Q = feature_selection_qubo(red, imp, alpha=0.7)
Perform QFS Algorithm
To perform Alg. 1 from the paper, you can use qfs()
, which takes the following arguments:
redundancy
andimportance
: Same as forfeature_selection_qubo()
k
: int, with 0 <k
<d
; target number of features to select.importance_threshold
andthreshold_penalty
: Same as forfeature_selection_qubo()
precision_limit
: non-negative float; smallest allowed search interval for binary search. Default is1e-8
.qubo_solver
: Callable function that takes a(d, d)
QUBO parameter matrix and outputs a binary vectorx
that minimizesx @ Q @ x
. Corresponds to the QUBO oracle used in Alg. 1.
The result in a tuple (alpha, min_x)
containing the optimal α value and the minimizing binary vector, which has k
ones.
The following example uses the light-weight package qubolite
to implement the QUBO oracle.
Note that the brute-force solver is only feasible for dimensions up to about 30.
from qubolite import qubo
from qubolite.solving import brute_force
def brute_force_solver(q):
return brute_force(qubo(q))
alpha, min_x = qfs(red, imp, 5, qubo_solver=brute_force_solver)
Citation
If you use this package in your scientific work, please cite the following article:
Mücke, S., Heese, R., Müller, S. et al. Feature selection on quantum computers. Quantum Mach. Intell. 5, 11 (2023). https://doi.org/10.1007/s42484-023-00099-z
Bibtex:
@article{muecke2023,
title={Feature selection on quantum computers},
author={M{\"u}cke, Sascha and Heese, Raoul and M{\"u}ller, Sabine and Wolter, Moritz and Piatkowski, Nico},
journal={Quantum Machine Intelligence},
volume={5},
number={1},
pages={11},
year={2023},
publisher={Springer},
doi={10.1007/s42484-023-00099-z}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file seleqt-1.0.tar.gz
.
File metadata
- Download URL: seleqt-1.0.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37144d5d0b3f3e934d0043f213fbf78e4781d327b4ee00ee226ea5e861a3929a |
|
MD5 | a43eea80ac3a5655cf910508f223fdc8 |
|
BLAKE2b-256 | 9057ce17c0da186e566eb44c8a4f0b8a757fbef8d23973c372376ba84d4661a0 |
File details
Details for the file seleqt-1.0-py3-none-any.whl
.
File metadata
- Download URL: seleqt-1.0-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | da5c5bfd681530ad8a4da2104973e84e2944f5301a33c923c19204dd6543cf1d |
|
MD5 | f5415d83b2c23d727a67e3fe0e0322a2 |
|
BLAKE2b-256 | 3bb76ca73b400599a1557fed603ea996a7b8d41cfd7581409940195ba0c1ef5c |