Skip to main content

Tools for collecting , processing and analyzing twitter data

Project description

Twitter Toolbox

A suite of tools for collecting, pre-processing, analyzing and sentiment-scoring twitter data. A additional brief walkthrough can be found here.

Install:

pip install twitter-nlp-toolkit

To utilize the sentiment analysis package, you will also need to install SpaCy's small English language model.

python -m spacy download en_core_web_sm

While the package is still under active development, the following functionality is expected to be stable:

Listener

twitter_nlp_toolkit.twitter_listener is the listener module, which can be used to monitor Twitter and stream tweets to disk in .json format.

keywords = ["python"]
stream = twitter_listener.TwitterStreamListener(**credentials)
stream.collect_from_stream(max_tweets=10,output_json_name="python_tweets.json", target_words=keywords)

"keywords" uses the Twitter API. Documentation and tips for setting up smart keyword queries can be found here

"credentials" contains your Twitter API key, which can be obtained for free here

The module also contains a parser to convert the .json-formatted tweets into .csv for easy use (ie, with Pandas) or convert straight to a pandas dataframe.

parser = tweet_json_parser.json_parser()
parser.stream_json_file(json_file_name="python_tweets.json",output_file_name="parsed_python_tweets.csv")

Alternatively, the parser can also convert the json file to a pandas dataframe

parser = tweet_json_parser.json_parser()
df = parser.parse_json_file_into_dataframe(json_file_name="ai_tweets.json")

Bulk Downloader

twitter_nlp_toolkit.twitter_REST_downloader is the bulk download module, which can be used to collect the last 200 (or so) tweets from a single user.

downloader = twitter_REST_downloader.bulk_downloader(**credentials)
downloader.get_tweets_csv_for_this_user("@nytimes","nyt_tweet_output.csv")

twitter_nlp_toolkit.tweet_sentiment_classifier is the sentiment analysis module, which can be used to classify the sentiment of tweets.

Classifier = tweet_sentiment_classifier.SentimentAnalyzer()
Classifier.load_small_ensemble()
Classifier.predict(['I am happy', 'I am sad', 'I am cheerful', 'I am mad']) # will return [1, 0, 1, 0]

Currently only two ensembles are provided: the small ensemble, which uses bag-of-words logistic regression model and two long short-term memory neural networks, and the large ensemble, which uses the bog-of-words model, two larger LSTM networks, and a Google BERT model. The large ensemble is more accurate (and expected to become much more accurate), but is extremely resource intensive and as such, isn't recommended for processing large numbers of tweets unless you have a powerful GPU.

These ensembles were trained primarily on the Sent140 dataset and primarily tested against the US Airlines dataset previously hosted on Crowdflower.com.

Please see the jupyter notebook (.ipynb) files at the root directory for further demonstrations of working code.

Advanced Use

If you have domain-specific training data, you can refine the ensembles:

Classifier.refine(train_x, train_y)
Classifier.save_models()
# To reload your saved models, you can run Classifier.load_models()

Other advanced use, such as building your own models, is possible but is not currently recommended as the models are still in development. Further documentation will be added once development stabilizes.

Bugs, issues, contributions, and feature requests

The developers are always open to feature requests, bugs reports, pull requests, and new opportunities to collaborate. Don't hesitate to reach out with questions, beedback, or requests.

Developers:

  • Moe Antar (@Moe520)

    • Twitter json parser and formatter
    • Twitter bulk downloader
    • Twitter Listener, in collaboration with Dr. Mirko Miorelli (https://github.com/mirkomiorelli)
    • File downloader
    • General maintenance and deployment procedures
  • Eric Schibli (@eschibli)

    • Data pre-processing pipeline
    • Natural language processing algorithms
    • Bag-of-words, LSTM neural networks, and BERT model assembly and training
    • General model optimization

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

twitter_nlp_toolkit-0.1.8.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

twitter_nlp_toolkit-0.1.8-py3-none-any.whl (26.6 kB view details)

Uploaded Python 3

File details

Details for the file twitter_nlp_toolkit-0.1.8.tar.gz.

File metadata

  • Download URL: twitter_nlp_toolkit-0.1.8.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for twitter_nlp_toolkit-0.1.8.tar.gz
Algorithm Hash digest
SHA256 ac73ac76ec55ac6e57c2954c6c206cc524fda2380cb2b25a8fdf8b76273bcf15
MD5 ec729cb186cc05eda44b1c5d05dfd9ff
BLAKE2b-256 38d7c291d6e22e628a18070fe5f2079d90cb254ac249715553a38811d20ccd1c

See more details on using hashes here.

File details

Details for the file twitter_nlp_toolkit-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: twitter_nlp_toolkit-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 26.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for twitter_nlp_toolkit-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 55698411d742a55bdb56fd69cd10c19f6430497fe799c7227274b6143ab8d4bf
MD5 225c22597d371ab49faf53a566f4d48f
BLAKE2b-256 28b4758e9c22f4acb181f1c752344b1bca3f0d165048837016570b83352d1a39

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page