Skip to main content

A simple spark streaming handler.

Project description

Real-Time Tweets Sentiment Analysis Package

Overview

Retrieving real-time tweets using twitter API, Apache Kafka, and Apache Spark Streaming; then, using tensorflow deep learning model to classify the tweets wether they positive, negative, or neutral; all in a pypi package.

TweetsAnalysis

The streamer and model package, available on pypi TweetsAnalysis

Package Requirements

  • gensim
  • pandas
  • pyspark
  • kafka-python
  • streamlit
  • scikit-learn
  • seaborn
  • tensorflow
  • tweepy==3.9.0
  • pydantic
  • strictyaml
  • joblib


Model

The model architecture:

The model results in about 85.5% in the train set and 84.4% accuracy on the test set, which has 160000 tweets; therefore, there is no over-fitting here.


Run

First we need to install the requirements with:

 pip install TweetsAnalysis

To train the model run, but first we need to specifiy the model and data directories in the config file:

python train_model.py

Straming

Start kafka with:

bin/zookeeper-server-start.sh config/zookeeper.properties
bin/kafka-server-start.sh config/server.properties

then create a kafka topic (tweets_stream) with:

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic tweets_stream

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SparkStream-1.0.0.tar.gz (10.2 kB view hashes)

Uploaded Source

Built Distribution

SparkStream-1.0.0-py3-none-any.whl (10.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page