Skip to main content

faKy is a Python library for text analysis. It provides functions for readability, complexity, sentiment, and statistical analysis in the scope of fake news detection.

Project description

faKy: Feature Extraction Library for Fake News Analysis

faKy is an advanced feature extraction library explicitly designed for analyzing and detecting fake news. It provides a comprehensive set of functions to compute various linguistic features essential for identifying fake news articles. With faKy, you can calculate readability scores, and information complexity, perform sentiment analysis using VADER, extract named entities, and apply part-of-speech tags. Additionally, faKy offers a Dunn test function for testing the significance between multiple independent variables.

Our goal with faKy is to contribute to developing more sophisticated and interpretable machine learning models and deepen our understanding of the underlying linguistic features that define fake news.

Installation

Before utilizing faKy, ensure that you have the necessary dependencies installed. Please refer to the requirements.txt file for detailed information. In particular, make sure you have spaCy and en_core_web_md installed in your terminal or kernel by executing the following commands:

pip install spacy
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_md-2.3.1/en_core_web_md-2.3.1.tar.gz

To verify the successful installation of en_core_web_md, you can use the command

!pip list

Once en_core_web_md is correctly installed, you can proceed to install faKy using the following command:

!pip install faKy==2.0.1

faKy automatically installs the required dependencies, including NLTK.

Getting Started with faKy

faKy allows you to compute features based on text objects, enabling you to extract features for all text objects within a dataframe. Here is an example code block demonstrating the usage:

First, import the faKy library, and the required functions:

from faKy.faKy import process_text_readability, process_text_complexity

Next, apply the process_text_readability function to your dataframe:

    dummy_df['readability'] = dummy_df['text-object'].apply(process_text_readability)

After applying this function to your dataframe, the features will be extracted and added as a new column, as shown in the example below:

Alt Text

faKy functionality

Here is a summary of the available functions in faKy:

Function Name Usage
readability_computation Computes the Flesch-Kincaid Reading Ease score for a spaCy document using the Readability class. Returns the original document object.
process_text_readability Takes a text string as input, processes it with spaCy's NLP pipeline, and computes the Flesch-Kincaid Reading Ease score. Returns the score.
compress_doc Compresses the serialized form of a spaCy Doc object using gzip, calculates the compressed size, and sets the compressed size to the custom "compressed_size" attribute of the Doc object. Returns the Doc object.
process_text_complexity Takes a text string as input, processes it with spaCy's custom NLP pipeline, and computes the compressed size. Returns the compressed size of the string in bits.
VADER_score Takes a text input and calculates the sentiment scores using the VADER sentiment analysis tool. Returns a dictionary of sentiment scores.
process_text_vader Takes a text input, applies the VADER sentiment analysis model, and returns the negative, neutral, positive, and compound sentiment scores as separate variables.
count_named_entities Takes a text input, identifies named entities using spaCy, and returns the count of named entities in the text.
count_ner_labels Takes a text input, identifies named entities using spaCy, and returns a dictionary of named entity label counts.
create_input_vector_NER Takes a dictionary of named entity recognition (NER) label counts and creates an input vector with the count for each NER label. Returns the input vector.
count_pos Counts the number of parts of speech (POS) in a given text. Returns a dictionary with the count of each POS.
create_input_vector_pos Takes a dictionary of POS tag counts and creates an input vector of zeros. Returns the input vector.
values_by_label Takes a DataFrame, a feature, a list of labels, and a label column name. Returns a list of lists containing the values of the feature for each label.
dunn_table Takes a DataFrame of Dunn's test results and creates a new DataFrame with pairwise comparisons between groups. Returns the new DataFrame.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

faKy-2.1.0.tar.gz (12.0 kB view hashes)

Uploaded Source

Built Distribution

faKy-2.1.0-py3-none-any.whl (15.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page