Skip to main content

Small (magickal) Tweet Processor

Project description

Magick Tweet Preprocessor Vihaus Ljovan

Magick Tweet processor is a small program that does some NLP-magick on tweet-strings. It comes with a cli-interface on which the language (english or spanish) can be chosen as well as what kinds of modifications on the original string (tokenisation, hiding URLs, hiding @-mentions etc.) the program should undertake.

We used the MIT Licence because "I want it simple and permissive" sounded perfect for our usecase. Also we read through the LICENSE.txt and it sounded good to us.

Example: processing tweets in file 'tweets.txt' without emoji-removal but with stopword-, hashtag- and url-removal as well as anonymization of mentions:

tpp --file tweets.txt --no_emoji_removal

All possible flags:

  -h, --help                 Show this help message and exit
  -f, --file                 Use file(s) instead of string.
  -u, --no_url_removal       Process without url-removal
  -E, --no_emoji_removal     Process without emoji-removal
  -H, --no_hashtag_removal   Process without hastag-removal
  -a, --no_anonymize         Process without anonymization
  -S, --no_stopword_removal  Process without stopword-removal
  -e, --english              Set Language to english (already default)
  -s, --spanish              Set Language to spanish

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

magick_tweet_preprocessor-0.0.1.tar.gz (5.3 kB view hashes)

Uploaded Source

Built Distribution

magick_tweet_preprocessor-0.0.1-py3-none-any.whl (7.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page