Naive Feature Selection
Project description
NFS: Naive Feature Selection
This package solves the Naive Feature Selection problem described in the paper.
Installation
pip install git+https://github.com/aspremon/NaiveFeatureSelection
Usage
Minimal usage script
The DemoNFS.py script loads the 20 newsgroups text data set from scikit-learn and reports accuracy of Naive Feature Selection, followed by SVC using the selected features.
The package is compatible with scikit-learn's Fit-Transform paradigm. To demonstrate this, DemoNFS.py runs the same test using the pipeline package from scikit-learn and performs cross validation using GridSearchCV from sklearn.model_selection.
To run the DemoNFS.py
script, type
python DemoNFS.py
This should produce the following output
Testing NFS ...
Loading 20 newsgroups dataset for categories:
['sci.med', 'sci.space']
Extracting features from the training data using a sparse vectorizer
n_samples: 1187, n_features: 21368
Extracting features from the test data using the same vectorizer
n_samples: 790, n_features: 21368
NFS accuracy: 0.843
Space features:
['aerospace', 'allen', 'ames', 'apollo', 'astronomy', 'billion', 'built', 'centaur', 'comet', 'command', 'commercial', 'cost', 'data', 'dc', 'dryden', 'earth', 'flight', 'funding', 'government', 'gravity', 'jupiter', 'landing', 'launch', 'launched', 'launches', 'lunar', 'mars', 'mary', 'mining', 'mission', 'missions', 'moon', 'nasa', 'orbit', 'orbital', 'pat', 'payload', 'planetary', 'program', 'project', 'proton', 'rocket', 'rockets', 'russian', 'satellite', 'satellites', 'shafer', 'shuttle', 'software', 'solar', 'space', 'spacecraft', 'ssto', 'station', 'sun', 'titan', 'vehicle']
Med features:
['allergic', 'banks', 'blood', 'brain', 'cadre', 'cancer', 'candida', 'chastity', 'diagnosed', 'diet', 'disease', 'diseases', 'doctor', 'doctors', 'drug', 'drugs', 'dsl', 'food', 'foods', 'geb', 'gordon', 'health', 'intellect', 'lyme', 'med', 'medical', 'medicine', 'msg', 'n3jxp', 'pain', 'patient', 'patients', 'pitt', 'seizures', 'shameful', 'skepticism', 'soon', 'surrender', 'symptoms', 'syndrome', 'therapy', 'treatment', 'yeast']
Pipeline accuracy: 0.843
Best cross validated k: 500
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file naive_feature_selection-0.0.1.tar.gz
.
File metadata
- Download URL: naive_feature_selection-0.0.1.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.6.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f39f17150c3e3624275dfac7dfd85fe8eb812930623d5ce4091efafa6e41789 |
|
MD5 | 6b285b46be6a93f2b26c2d2437e659ae |
|
BLAKE2b-256 | 5dca0b2756cf50126970c0ebb001972d247c9dc0a32ca0b5a68287d06627b5b2 |
File details
Details for the file naive_feature_selection-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: naive_feature_selection-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.6.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3173155c1c890639c33bd07ec7dd282f793be4cdfa20cb1769b9627ff15f04a8 |
|
MD5 | 14e61726cd162b045f164e1e9d14b9ff |
|
BLAKE2b-256 | 15122a4922ea96d6b7af413c1f16bd0c623525ca928fe531fdd2180dec269cab |