Skip to main content

A lightweight toolbox for multilabel classification algorithms based on the k-nearest neighbors

Project description

multilabel_knn

Unit Test & Deploy

multilabel_knn is a lightweight toolbox for the multilabel classifications based on the k-nearest neighbor algorithms [Doc].

The following algorithms are implemented:

  • k-nearest neighbor classifier
  • multilabel k-nearest neighbor classifier (recommended for a small number of labels)
  • Binomial multilabel k-nearest neighbor classifier (recommended for large dataset with many labels)
  • Binomial multilabel graph neighbor classifer

Usage

k-nearest neighbor algorithm (Predict a single label per sample)

import multilabel_knn as mlk
model = mlk.kNN(k=10, metric = "cosine") #k: number of neighbors, metric: distance metric {"euclidean", "cosine"}
model.fit(X, Y) # X :2d feature vectors. Y: label matrix, where Y[i,k] = 1 if i has label k.
Y_pred = model.predict(X_test) # Y_pred[i,k] = 1 is i is predicted to have label k.
Y_prob = model.predict(X_test, return_prob = True) # Y_prob[i,k] is the likelihood that i has label k.

This algorithm predicts one label per sample. The label is predicted by the majority vote, i.e., the most frequent label for the nearest neighbors.

mutilabel kNN (Can predict multiple labels per sample) [1]

import multilabel_knn as mlk
model = mlk.multilabel_kNN(k=10, metric = "cosine")
model.fit(X, Y)
Y_pred = model.predict(X_test)
Y_prob = model.predict(X_test, return_prob = True)

Binomial mutilabel kNN (Can predict multiple labels per sample)

import multilabel_knn as mlk
model = mlk.binomial_multilabel_kNN(k=10, metric = "cosine")
model.fit(X, Y)
Y_pred = model.predict(X_test)
Y_prob = model.predict(X_test, return_prob = True)

Binomial multilabel kNN is a mobidifed version of multilabel kNN. It can perform well for data with a large number of samples and labels. See the docstring for details.

Binomial mutilabel graph (Take a graph as input. Can predict multiple labels per node)

import multilabel_knn as mlk
model = mlk.binomial_multilabel_graph()
model.fit(A, Y) # A is the adjacency matrix of the graph for training. A[i,j] =1 if node i has a link to node j.
Y_pred = model.predict(B) # B is the adjacency matrix of the biparite network, where B[i,j] =1 if node i has a link to node j in the training graph.
Y_prob = model.predict(X_test, return_prob = True)

Evaluation metrics

multilabel_knn has several evaluation metrics for multilabel classifications:

from multilabel_knn import evaluations

# Y: label matrix. Y[i,k]=1 if i has label k
# Y_pred: predicted label. Y_pred[i,k] if i is predicted to have label k
evaluations.micro_f1score(Y, Y_pred) # micro f1

evaluations.macro_f1score(Y, Y_pred) # macro f1

evaluations.micro_hamming_loss(Y, Y_pred) # micro hamming loss

evaluations.macro_hamming_loss(Y, Y_pred) # macro hamming loss

# Y_score: probability or likelihood that i has label k
evaluations.average_precision(Y, Y_score) # average precision

evaluations.auc_roc(Y, Y_score) # roc-auc

Install

Requirements: Python 3.7 or later

pip install multilabel_knn

For users without GPUs

Although the package is tested in multiple environments, it is still possible that you come across issues related to faiss, the most common problem being the one related to GPUs. If you don't have GPUs and get some troubles, try install faiss-cpu instead:

conda install -c conda-forge faiss-cpu

or with pip:

pip install faiss-cpu

For users with GPUs

multilabel_knn uses only CPUs by default but if you have GPUs, congratulations! You can get a substantial speed up!! To enable the GPU, specify gpu_id in the input argument. For example:

model = mlk.binomial_multilabel_kNN(k=10, metric = "cosine", gpu_id="cuda:0") # or gpu_id=0 depending on the system

Maintenance

Code Linting:

conda install -y -c conda-forge pre-commit
pre-commit install

Docsctring: sphinx format

Test:

python -m unittest tests/simple_test.py

Reference

[1] Zhang, Min-Ling, and Zhi-Hua Zhou. 2007. “ML-KNN: A Lazy Learning Approach to Multi-Label Learning.” Pattern Recognition 40 (7): 2038–48.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

multilabel_knn-0.0.5-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file multilabel_knn-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for multilabel_knn-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 86a3dc8916e4f12f08e94ead310701c72e64a0270faf8fc2ced5640b8b35424e
MD5 f50eb812a911d63a87e27f12c50e6b9e
BLAKE2b-256 f9f1a9450dfa03c92b395d779b495173ea9792d861160a7da78b3e3a6ffeca52

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page