Small (magickal) Tweet Processor
Project description
Magick Tweet Preprocessor Vihaus Ljovan
Magick Tweet processor is a small program that does some NLP-magick on tweet-strings. It comes with a cli-interface on which the language (english or spanish) can be chosen as well as what kinds of modifications on the original string (tokenisation, hiding URLs, hiding @-mentions etc.) the program should undertake.
We used the MIT Licence because "I want it simple and permissive" sounded perfect for our usecase. Also we read through the LICENSE.txt and it sounded good to us.
Example: processing tweets in file 'tweets.txt' without emoji-removal but with stopword-, hashtag- and url-removal as well as anonymization of mentions:
tpp --file tweets.txt --no_emoji_removal
All possible flags:
-h, --help Show this help message and exit
-f, --file Use file(s) instead of string.
-u, --no_url_removal Process without url-removal
-E, --no_emoji_removal Process without emoji-removal
-H, --no_hashtag_removal Process without hastag-removal
-a, --no_anonymize Process without anonymization
-S, --no_stopword_removal Process without stopword-removal
-e, --english Set Language to english (already default)
-s, --spanish Set Language to spanish
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for magick_tweet_preprocessor-0.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 04ee7690532c026bee9d9d57b58b7b1736d8d9f41e988fc581b3a9cb227ce90a |
|
MD5 | 0206c268e369c3c6c2a07548b7ba4628 |
|
BLAKE2b-256 | e40c2e184883a3c7ce9c667f07e4bacc36a003a22802eec219d472f107937d10 |
Hashes for magick_tweet_preprocessor-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a9c98a4c91d6aab08302985c77a9a252d547676c16fce2fec260efe893c010f |
|
MD5 | 2434fa95f5f72dca2047538ce56a893f |
|
BLAKE2b-256 | b44dd18b86a0cb0fc4b5f91db0a701ff3d32c6895694b2900c1607a1ac4d9cdf |