Naive Feature Selection
Project description
NFS: Naive Feature Selection
This package solves the Naive Feature Selection problem described in the paper.
Installation
pip install git+https://github.com/aspremon/NaiveFeatureSelection
Usage
Minimal usage script
The DemoNFS.py script loads the 20 newsgroups text data set from scikit-learn and reports accuracy of Naive Feature Selection, followed by SVC using the selected features.
The package is compatible with scikit-learn's Fit-Transform paradigm. To demonstrate this, DemoNFS.py runs the same test using the pipeline package from scikit-learn and performs cross validation using GridSearchCV from sklearn.model_selection.
To run the DemoNFS.py
script, type
python DemoNFS.py
This should produce the following output
Testing NFS ...
Loading 20 newsgroups dataset for categories:
['sci.med', 'sci.space']
Extracting features from the training data using a sparse vectorizer
n_samples: 1187, n_features: 21368
Extracting features from the test data using the same vectorizer
n_samples: 790, n_features: 21368
NFS accuracy: 0.843
Space features:
['aerospace', 'allen', 'ames', 'apollo', 'astronomy', 'billion', 'built', 'centaur', 'comet', 'command', 'commercial', 'cost', 'data', 'dc', 'dryden', 'earth', 'flight', 'funding', 'government', 'gravity', 'jupiter', 'landing', 'launch', 'launched', 'launches', 'lunar', 'mars', 'mary', 'mining', 'mission', 'missions', 'moon', 'nasa', 'orbit', 'orbital', 'pat', 'payload', 'planetary', 'program', 'project', 'proton', 'rocket', 'rockets', 'russian', 'satellite', 'satellites', 'shafer', 'shuttle', 'software', 'solar', 'space', 'spacecraft', 'ssto', 'station', 'sun', 'titan', 'vehicle']
Med features:
['allergic', 'banks', 'blood', 'brain', 'cadre', 'cancer', 'candida', 'chastity', 'diagnosed', 'diet', 'disease', 'diseases', 'doctor', 'doctors', 'drug', 'drugs', 'dsl', 'food', 'foods', 'geb', 'gordon', 'health', 'intellect', 'lyme', 'med', 'medical', 'medicine', 'msg', 'n3jxp', 'pain', 'patient', 'patients', 'pitt', 'seizures', 'shameful', 'skepticism', 'soon', 'surrender', 'symptoms', 'syndrome', 'therapy', 'treatment', 'yeast']
Pipeline accuracy: 0.843
Best cross validated k: 500
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for naive_feature_selection-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f39f17150c3e3624275dfac7dfd85fe8eb812930623d5ce4091efafa6e41789 |
|
MD5 | 6b285b46be6a93f2b26c2d2437e659ae |
|
BLAKE2b-256 | 5dca0b2756cf50126970c0ebb001972d247c9dc0a32ca0b5a68287d06627b5b2 |
Close
Hashes for naive_feature_selection-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3173155c1c890639c33bd07ec7dd282f793be4cdfa20cb1769b9627ff15f04a8 |
|
MD5 | 14e61726cd162b045f164e1e9d14b9ff |
|
BLAKE2b-256 | 15122a4922ea96d6b7af413c1f16bd0c623525ca928fe531fdd2180dec269cab |