Skip to main content

Active feature selection method with support vector classifier.

Project description

activeSVC

ActiveSVC selects features for large matrix data with reduced computational complexity or limited data acqusition. It approachs Sequential Feature Selection through an active learning strategy with a support vector machine classifier. At each round of iteration, the procedure analyzes only the samples that classify poorly with the current feature set, and the procedure extends the feature set by identifying features within incorrectly classified samples that will maximally shift the classification margin. There are two strategy, min_complexity and min_acqusition. Min_complexity strategy tends to use less samples each iteration while min_acqusition strategy tends to re-use samples used in previous iterations to minimize the total samples we acquired during the procedure.

Why is activeSVC better than other feature selection methods?

  • Easy to use
  • Good for large datasets
  • Reduce computational complexity
  • Minimize the data size we need

Usage

ActiveSVC processes a datasets with training set and test set and returns the features selected, training accuracy, test accuracy, training mean squared error, test mean squared error, the number of samples acquired after every features are selected.

Requires

numpy, random, math, os, time, parfor, sklearn, matplotlib

Import

from activeSVC import min_complexity
from activeSVC import min_acquisition

Function

  • min_complexity
  • min_acqusition

min_complexity

Parameters

X_train: {ndarray, sparse matrix} of shape {n_samples_X, n_features}
        Input data of training set.
y_train: ndarray of shape {n_samples_X,}
        Input classification labels of training set.
X_test: {ndarray, sparse matrix} of shape {n_samples_X, n_features}
        Input data of test set.
y_test: ndarray of shape {n_samples_X,}
        Input classification labels of test set.
num_features: iteger
        The total number of features to select.
num_samples: iteger
        The number of samples to use in each iteration (for each feature).
balance: bool, default=False
        Balance samples of each classes when sampling misclassified samples at each iteration or randomly sample misclassified samples.

Return

feature_selected: list of iteger
        The sequence of features selected.
num_samples_list: list of iteger
        The number of unique samples acquired totally after every features are selected.
train_errors: list of float
        Mean squared error of training set after every features are selected.
test_errors: list of float
        Mean squared error of test set after every features are selected.
train_accuracy: list of float
        Classification accuracy of training set after every features are selected.
test_accuracy: list of float
        Classification accuracy of test set after every features are selected.

min_acquisition

Parameters

X_train: {ndarray, sparse matrix} of shape {n_samples_X, n_features}
        Input data of training set.
y_train: ndarray of shape {n_samples_X,}
        Input classification labels of training set.
X_test: {ndarray, sparse matrix} of shape {n_samples_X, n_features}
        Input data of test set.
y_test: ndarray of shape {n_samples_X,}
        Input classification labels of test set.
num_features: iteger
        The total number of features to select.
num_samples: iteger
        The number of misclassified samples randomly sampled, which are taken union with samples already acquired before. The union of samples are used for next ietration.

Return

feature_selected: list of iteger
        The sequence of features selected.
num_samples_list: list of iteger
        The number of unique samples acquired totally after every features are selected.
samples_global: list of iteger
        The indices of samples that are acquired.
train_errors: list of float
        Mean squared error of training set after every features are selected.
test_errors: list of float
        Mean squared error of test set after every features are selected.
train_accuracy: list of float
        Classification accuracy of training set after every features are selected.
test_accuracy: list of float
        Classification accuracy of test set after every features are selected.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

activeSVC-0.0.1.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

activeSVC-0.0.1-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file activeSVC-0.0.1.tar.gz.

File metadata

  • Download URL: activeSVC-0.0.1.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.9

File hashes

Hashes for activeSVC-0.0.1.tar.gz
Algorithm Hash digest
SHA256 45e09765046210c6c44a1e1fc591ebf787f08e1ac97cbf1ea2468b74d47b15db
MD5 9ed3edd2686cfa7929139afdaa0bc9c8
BLAKE2b-256 9f947309e40a9fdfcfb96ad4821c146f4b5e012b2c5208dfc1b599a4092fe775

See more details on using hashes here.

File details

Details for the file activeSVC-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: activeSVC-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 16.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.9

File hashes

Hashes for activeSVC-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 33065cfdbac5028021069d23cc33a5a51366607bebb29f97b22390560f6c5e5e
MD5 a892269ec65c99315c59eceb1b1100ec
BLAKE2b-256 f65ac1914c2cb13cf5e40e647d34efca589787fbc5a6a4f9ee9528caa8ef925b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page