Skip to main content

Python implementation of Neighborhood Component Feature Selection (NCFS)

Project description

Neigbhorhood Component Feature Selection

This is a Python implementation of Neighborhood Component Feature Selection, originally introduced in Yang et al. 2012. NCFS is an embedded feature selection method that learns feature weights by maximizing prediction accuracy in a leave-one-out KNN classifier.

Installation

The package can be with pip using the following command:

pip install ncfs

Example

from NCFS import NCFS

X, y = NCFS.toy_dataset()
feature_select = NCFS.NCFS()
feature_select.fit(X, y)
print(sum(feature_select.coef_ > 1))

Tests

To compare results to the original paper run the following command python tests/generate_results.py

To perform unit tests ensuring accurate distance calculations, run: python tests/test_distances.py

Comparison with Original Results

Distance metric

The original paper uses the Manhattan distance when calculating distances between samples/features. While this implementation defaults to using this distance, weights comparable with published results were only found using the euclidean distance. However, while exact weights differed between distance metrics, the selected features did not. Unfortunately, the original paper did not link to the code used, and I've been unable to find a public implementation of the aglorithm.

Numerical stability

NCFS uses the original kernel function when calculating probabilities; however, with a large number of features, distance values can easily approach a large enough value such that the negative exponent rounds to zero. This leads to division by zero issues, and fitting fails. To get around this, small pseudocounts are added to distances when a division by zero would otherwise occur. To keep distances small, features should be scaled between 0 and 1 (enforced by NCFS).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncfs-0.1.2.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

ncfs-0.1.2-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file ncfs-0.1.2.tar.gz.

File metadata

  • Download URL: ncfs-0.1.2.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.8

File hashes

Hashes for ncfs-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e9e228f5d1691644d419a5a8c84eb8861e930b817a324a959eb095cd051b13c0
MD5 18693bc9c2884c557621268576c2d23f
BLAKE2b-256 0531bd7ff1eff6486f0e35a6e8a2f75c3b444fa262702c8137f0faefc09be582

See more details on using hashes here.

File details

Details for the file ncfs-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ncfs-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.8

File hashes

Hashes for ncfs-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fa4beb7b69766444cae68883e682f20bb7e094451201b8d53f6afabaee0e6fb5
MD5 fa2872aff1bfc738b44bf35a5c719f91
BLAKE2b-256 2703540739b3703740318a1934ec5a5d08d57ae7d53c6bdde037117e9a310066

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page