Implementation of various algorithms for feature selection for text features based on wrapper method

Project description

What is it?

TextFeatureSelection is a Python package providing feature selection for text tokens through wrapper method of feature selection and we can set a threshold to decide which words to be included. There are 4 methods for helping feature selection.

Chi-square It measures the lack of independence between term(t) and class(c). It has a natural value of zero if t and c are independent. If it is higher, then term is dependent. It is not reliable for low-frequency terms
Mutual information Rare terms will have a higher score than common terms. For multi-class categories, we will calculate MI value for all categories and will take the Max(MI) value across all categories at the word level.
Proportional difference How close two numbers are from becoming equal. It helps ﬁnd unigrams that occur mostly in one class of documents or the other.
Information gain It gives discriminatory power of the word.

Input parameters

target list object which has categories of labels. for more than one category, no need to dummy code and instead provide label encoded values as list object.
input_doc_list List object which has text. each element of list is text corpus. No need to tokenize, as text will be tokenized in the module while processing. target and input_doc_list should have same length.
stop_words Words for which you will not want to have metric values calculated. Default is blank
metric_list List object which has the metric to be calculated. There are 4 metric which are being computed as 'MI','CHI','PD','IG'. you can specify one or more than one as a list object. Default is ['MI','CHI','PD','IG']. Chi-square(CHI), Mutual information(MI), Proportional difference(PD) and Information gain(IG) are 4 metric which are calculated for each tokenized word from the corpus to aid the user for feature selection.

How to use is it?

from TextFeatureSelection import TextFeatureSelection

#Multiclass classification problem
input_doc_list=['i am very happy','i just had an awesome weekend','this is a very difficult terrain to trek. i wish i stayed back at home.','i just had lunch','Do you want chips?']
target=['Positive','Positive','Negative','Neutral','Neutral']
fsOBJ=TextFeatureSelection(target=target,input_doc_list=input_doc_list)
result_df=fsOBJ.getScore()
print(result_df)


#Binary classification
input_doc_list=['i am content with this location','i am having the time of my life','you cannot learn machine learning without linear algebra','i want to go to mars']
target=[1,1,0,1]
fsOBJ=TextFeatureSelection(target=target,input_doc_list=input_doc_list)
result_df=fsOBJ.getScore()
print(result_df)

Where to get it?

pip install TextFeatureSelection

Dependencies

References

A Comparative Study on Feature Selection in Text Categorization by Yiming Yang and Jan O. Pedersen
Entropy based feature selection for text categorization by Christine Largeron, Christophe Moulin, Mathias Géry
Categorical Proportional Difference: A Feature Selection Method for Text Categorization by Mondelle Simeon, Robert J. Hilderman
Feature Selection and Weighting Methods in Sentiment Analysis by Tim O`Keefe and Irena Koprinska

Project details

Release history Release notifications | RSS feed

0.3.0

Aug 6, 2023

0.2.9

Mar 22, 2023

0.2.8

Mar 22, 2023

0.2.7

Mar 22, 2023

0.2.6

Mar 22, 2023

0.2.5

Mar 21, 2023

0.2.4

Mar 18, 2023

0.2.3

Mar 18, 2023

0.2.2

Mar 18, 2023

0.1.2

Mar 18, 2023

0.0.19

Mar 18, 2023

0.0.18

Jan 1, 2023

0.0.17

Jan 1, 2023

0.0.16

Dec 27, 2022

0.0.15

Aug 13, 2021

0.0.14

Aug 12, 2021

0.0.13

Aug 12, 2021

0.0.12

Jan 30, 2021

0.0.11

Jan 30, 2021

0.0.8

Aug 10, 2020

0.0.7

Aug 7, 2020

0.0.6

Aug 3, 2020

0.0.5

Aug 3, 2020

0.0.4

Aug 2, 2020

0.0.3

Aug 2, 2020

0.0.2

Jun 16, 2020

This version

0.0.1

May 27, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TextFeatureSelection-0.0.1.tar.gz (5.1 kB view details)

Uploaded May 27, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

TextFeatureSelection-0.0.1-py3-none-any.whl (7.4 kB view details)

Uploaded May 27, 2020 Python 3

File details

Details for the file TextFeatureSelection-0.0.1.tar.gz.

File metadata

Download URL: TextFeatureSelection-0.0.1.tar.gz
Upload date: May 27, 2020
Size: 5.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.9

File hashes

Hashes for TextFeatureSelection-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`0cea7d977e95c316582717ea4154a9d4a07045ccc4ace0bdb2de64ca833b94c3`
MD5	`adf64ea3c8791316d147b1325b2b14e6`
BLAKE2b-256	`60ab4da4826d6628c49aff5055d2bbe8aa0a16ce47ff8d5a915e1662124e0754`

See more details on using hashes here.

File details

Details for the file TextFeatureSelection-0.0.1-py3-none-any.whl.

File metadata

Download URL: TextFeatureSelection-0.0.1-py3-none-any.whl
Upload date: May 27, 2020
Size: 7.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.9

File hashes

Hashes for TextFeatureSelection-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1f656452534298922fd2735d1776219ff5d27257ef3c2880245a7712be9e0c9d`
MD5	`5bf019eefac4bbdca49a05aff283fc06`
BLAKE2b-256	`bd86a70ea29e74ab15c63867e54e87954d141e51518c792db576cfec3f9d58eb`

See more details on using hashes here.

TextFeatureSelection 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

What is it?

Input parameters

How to use is it?

Where to get it?

Dependencies

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes