Skip to main content

A small preprocessor for tweets.

Project description

# tweetprep

A simple python library for preprocessing tweets to make them training ready. Use it to preprocess the tweets before feeding them to Machine Learning or Deep Learning models.

The code is Python 2 and 3 compatible.

# Installation

Fast install:

pip install tweetprep

For a manual install get this package:

$wget https://github.com/garain/tweetprep/archive/master.zip
$unzip master.zip
$rm master.zip
$cd tweetprep-master

Install the package:

python setup.py install

# Example

from tweetprep import preprocess
#from tweetprep import lang_translator

tweet = "#COVID-19 is the worst pandemic @2020!! :,("
# get translated tweet
lang="es"
print(preprocess.lang_translator.translate(tweet,dest=lang).text)

# Get processed version of tweet
print(preprocess.clean(tweet))

Here is the output:

# COVID-19 es la peor pandemia @ 2020!! :,(
covid19 is the worst pandemic crying smiley

Please cite these publications if this library comes to any use:

  • Ray, Biswarup, Avishek Garain, and Ram Sarkar. “An ensemble-based hotel recommender system using sentiment analysis and aspect categorization of hotel reviews.” Applied Soft Computing 98 (2021): 106935.

  • Garain, Avishek, and Sainik Kumar Mahata. “Sentiment Analysis at SEPLN (TASS)-2019: Sentiment Analysis at Tweet Level Using Deep Learning.” (2019).

  • Garain, Avishek, and Arpan Basu. “The titans at SemEval-2019 task 5: Detection of hate speech against immigrants and women in twitter.” Proceedings of the 13th International Workshop on Semantic Evaluation. 2019.

  • Garain, Avishek. “Humor Analysis based on Human Annotation (HAHA)-2019: Humor Analysis at Tweet Level using Deep Learning.” (2019).

  • Garain, Avishek, and Arpan Basu. “The titans at SemEval-2019 task 6: Offensive language identification, categorization and target identification.” Proceedings of the 13th International Workshop on Semantic Evaluation. 2019.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tweetprep-2.0.6.tar.gz (5.0 kB view details)

Uploaded Source

File details

Details for the file tweetprep-2.0.6.tar.gz.

File metadata

  • Download URL: tweetprep-2.0.6.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Python-urllib/3.6

File hashes

Hashes for tweetprep-2.0.6.tar.gz
Algorithm Hash digest
SHA256 ca77e580617e30de323850b9e5add76d882ad456549c0787087dc13e065d4a4f
MD5 eaf82bd3910f0df1a74d7467e5fdf051
BLAKE2b-256 49eb48caf8125f0217043cca3e45c5dc94b5a8ed3cd35b8956b62aaf36541021

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page