Skip to main content

Performs pre-processing of tweets

Project description

ci-cd

pytextprep

This is a Python package that offers additional text preprocessing functionality specifically designed for tweets. The package bundles functions to help with cleaning and gaining insight into tweet data, providing additional resources for EDA or enabling feature engineering.

The main functions of this package are:

  • remove_punct : Removes punctuation from a list of tweets

  • extract_ngram: Extracts n-grams from a list of tweets

  • extract_hashtags: Creates a list of hashtags from a list of tweets

  • generate_cloud: Creates a word cloud of the most frequent words from a list of tweets

In the Python ecosystem the only popular package focused on tweet data is tweet-preprocessor. Even though this package is also customized specifically for dealing with Tweeter data its scope is solely oriented to tokenizing and cleaning the tweets. In contrast, our package can be leveraged to extract new features out of tweets.

Installation

Install using pip:

$ pip install pytextprep

Install from source:

$ git clone git@github.com:UBC-MDS/pytextprep.git
cd pytextprep
git checkout main #latest release
pip install .

Usage

Documentation

Please follow the steps below:

Create a new conda environment named pytextprep:

conda create --name pytextprep python=3.9 -y

Activate the conda environment pytextprep:

conda activate pytextprep

Install the package:

pip install pytextprep

If the package fails to install due to the wordcloud package, please install wordcloud using the following command and then install pytextprep again.

conda install -c conda-forge wordcloud -y

Open Python:

python

You can now use the package functions as:

from pytextprep.extract_ngram import extract_ngram
from pytextprep.extract_hashtags import extract_hashtags
from pytextprep.remove_punct import remove_punct
from pytextprep.generate_cloud import generate_cloud
import matplotlib.pyplot as plt

tweets_list = ["Make America Great Again! @DonalTrump", "It's a new day in #America"]
extract_ngram(tweets_list, n=3)
['Make America Great', 'America Great Again!', 'Great Again! @DonalTrump', "Again! @DonalTrump It's", "@DonalTrump It's a", "It's a new", 'a new day', 'new day in', 'day in #America']
extract_hashtags(tweets_list)
['America']
remove_punct(tweets_list, skip=["'", "@", "#", '-'])
['Make America Great Again @DonalTrump', "It's a new day in #America"]
fig, wc = generate_cloud(tweets_list)
plt.show()

word_cloud

Contributing

Contributors: Arijeet Chatterjee, Joshua Sia, Melisa Maidana, Philson Chan (DSCI_524_GROUP21).

Interested in contributing? Check out the contributing guidelines.

Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

pytextprep was created by Arijeet Chatterjee, Joshua Sia, Melisa Maidana, Philson Chan (DSCI_524_GROUP21).

It is licensed under the terms of the MIT license.

Credits

pytextprep was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytextprep-1.0.7.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

pytextprep-1.0.7-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file pytextprep-1.0.7.tar.gz.

File metadata

  • Download URL: pytextprep-1.0.7.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for pytextprep-1.0.7.tar.gz
Algorithm Hash digest
SHA256 d57f0e0ae2dce815a8c3bd4e8aba933d7519aa9c00835c2a7981cf8332b8d368
MD5 6ac935cc8962c45a9ef84ba0f44a60c4
BLAKE2b-256 c5835f5b03f19839a0892f1824112541a30e14b1a0a1675ad98beed12fe198d3

See more details on using hashes here.

File details

Details for the file pytextprep-1.0.7-py3-none-any.whl.

File metadata

  • Download URL: pytextprep-1.0.7-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for pytextprep-1.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 35c007d0ce10383d9ee67bf0fd3b4e405a4d258ed01f8d8e7319667b9992cb2c
MD5 2ffeb1110306e5eab0f2b0f2d43ec864
BLAKE2b-256 96c545ee1956df18a9437f2c5935a9e3acc0b52347d707efbf3e864633abb3aa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page