Skip to main content

faKy is a Python library for text analysis. It provides functions for readability, complexity, sentiment, and statistical analysis in the scope of fake news detection.

Project description

faKy: Feature Extraction Library for Fake News Analysis

faKy is an advanced feature extraction library explicitly designed for analyzing and detecting fake news. It provides a comprehensive set of functions to compute various linguistic features essential for identifying fake news articles. With faKy, you can calculate readability scores, and information complexity, perform sentiment analysis using VADER, extract named entities, and apply part-of-speech tags. Additionally, faKy offers a Dunn test function for testing the significance between multiple independent variables.

Our goal with faKy is to contribute to developing more sophisticated and interpretable machine learning models and deepen our understanding of the underlying linguistic features that define fake news.

Installation

Before utilizing faKy, ensure that you have the necessary dependencies installed. Please refer to the requirements.txt file for detailed information. In particular, make sure you have spaCy and en_core_web_md installed in your terminal or kernel by executing the following commands:

pip install spacy
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_md-2.3.1/en_core_web_md-2.3.1.tar.gz

To verify the successful installation of en_core_web_md, you can use the command

!pip list

Once en_core_web_md is correctly installed, you can proceed to install faKy using the following command:

!pip install faKy==2.0.1

faKy automatically installs the required dependencies, including NLTK.

Getting Started with faKy

faKy allows you to compute features based on text objects, enabling you to extract features for all text objects within a dataframe. Here is an example code block demonstrating the usage:

First, import the faKy library, and the required functions:

from faKy.faKy import process_text_readability, process_text_complexity

Next, apply the process_text_readability function to your dataframe:

    dummy_df['readability'] = dummy_df['text-object'].apply(process_text_readability)

After applying this function to your dataframe, the features will be extracted and added as a new column, as shown in the example below:

Alt Text

faKy functionality

Here is a summary of the available functions in faKy:

Function Name Usage
readability_computation Computes the Flesch-Kincaid Reading Ease score for a spaCy document using the Readability class. Returns the original document object.
process_text_readability Takes a text string as input, processes it with spaCy's NLP pipeline, and computes the Flesch-Kincaid Reading Ease score. Returns the score.
compress_doc Compresses the serialized form of a spaCy Doc object using gzip, calculates the compressed size, and sets the compressed size to the custom "compressed_size" attribute of the Doc object. Returns the Doc object.
process_text_complexity Takes a text string as input, processes it with spaCy's custom NLP pipeline, and computes the compressed size. Returns the compressed size of the string in bits.
VADER_score Takes a text input and calculates the sentiment scores using the VADER sentiment analysis tool. Returns a dictionary of sentiment scores.
process_text_vader Takes a text input, applies the VADER sentiment analysis model, and returns the negative, neutral, positive, and compound sentiment scores as separate variables.
count_named_entities Takes a text input, identifies named entities using spaCy, and returns the count of named entities in the text.
count_ner_labels Takes a text input, identifies named entities using spaCy, and returns a dictionary of named entity label counts.
create_input_vector_NER Takes a dictionary of named entity recognition (NER) label counts and creates an input vector with the count for each NER label. Returns the input vector.
count_pos Counts the number of parts of speech (POS) in a given text. Returns a dictionary with the count of each POS.
create_input_vector_pos Takes a dictionary of POS tag counts and creates an input vector of zeros. Returns the input vector.
values_by_label Takes a DataFrame, a feature, a list of labels, and a label column name. Returns a list of lists containing the values of the feature for each label.
dunn_table Takes a DataFrame of Dunn's test results and creates a new DataFrame with pairwise comparisons between groups. Returns the new DataFrame.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

faKy-2.1.0.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

faKy-2.1.0-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file faKy-2.1.0.tar.gz.

File metadata

  • Download URL: faKy-2.1.0.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for faKy-2.1.0.tar.gz
Algorithm Hash digest
SHA256 c84d8521ed2fe7b941996d7908e19f54013560be3d74c34681d004c14b08d6a9
MD5 3529581ad344d6cf7bdb17bdb721d67f
BLAKE2b-256 6c7df35ae476cbbc071b4a3e46866ab4eaeae3dffcb7c95b959824b79d184ba6

See more details on using hashes here.

File details

Details for the file faKy-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: faKy-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for faKy-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6f00e423a53ccfa486e9f9429205d253faf67dfb1e38b8a588b4f270fba4b706
MD5 1b618f25b4dff675368dbb160c6410f9
BLAKE2b-256 ead6f09c8936967f3f298993052559f23896fb96ae396bb347cce99712cf397a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page