Skip to main content

A tool to gather, discover, and analyze Twitter data using a combination of graph-clustering and topic modeling techniques with the goal of semantically grouping tweet messages together.

Project description

pytwanalysis - (Twitter Analysis)

A tool to gather, discover, and analyze Twitter data using a combination of graph-clustering and topic modeling techniques with the goal of semantically grouping tweet messages together.

Installation

pip install pytwanalysis

Initializing an object

import pytwanalysis as ta
#set up your mongoDB connection here
mongoDBConnectionSTR = "mongodb://localhost:27017"
client = MongoClient(mongoDBConnectionSTR)
db = client.yourDB #chose your DB name here
BASE_PATH = '[youFolderPath]' #path where you want to save your files
x = ta.TwitterAnalysis(BASE_PATH, db)

Requirements:

  1. Python 3.7

  2. Database: MongoDB - (Version: 4.0+)

  3. Libraries:

  • pymongo
  • NLTK
  • numpy
  • networkx 2.3
  • matplotlib 3.2.1
  • gensim
  • sklearn
  • python-louvain
  • scipy
  • seaborn
  • pandas
  • wordcloud
  • Pyphen
  • requests-oauthlib
Pre-requisites installation
 pip install pymongo
 pip install nltk
 pip install numpy
 pip install networkx==2.3
 pip install matplotlib==3.2.1
 pip install gensim
 pip install -U scikit-learn 
 pip install python-louvain 
 pip install scipy 
 pip install seaborn 
 pip install pandas 
 pip install wordcloud
 pip install Pyphen
 pip install requests-oauthlib

Things you can do with this library:

  • Use mongoDB to store and process your Twitter data
  • Export edges created based on user connections
  • create graphs, timeseries analysis, topic analysis, and graph analysis of you Twitter data
  • create folder structure to save all files (by period or not)
  • create the following files for each folder and sub folder
    • nodes with degrees
    • edges
    • texts for topics
    • graph with lda model
    • graph plot
    • graph plot with contracted nodes
    • hashtag & words frequency list
    • hashtags & words barChart
    • timeseries plot (tweet count & hashtag count(
    • wordclouds (high degree nodes, high frequency hashtags, high frequency words)

Data Management with mongoDB:

  • load json twitter files into mongoDB

    *The logic is setup so that you can run the same file multiple times. It won't load the same file twice. And if something fails, it starts from where it stopped.

  • create aggreation collections with data for EDA (e.g. tweetCountByFile, hashtagCount, tweetCountByLanguageAgg, tweetCountByPeriodAgg, tweetCountByUser)

  • break text into words

  • create collection with hashtags for each tweet

  • create collection with edges between users formed by replies, retweets, quotes and mentions

  • create collection with users info

  • export data into \t delimeted files that can be opened as CSV files

  • run different topic model analysis for hashtags groups

Graph Analysis

  • load a networkx file from node/edge files
  • print measurements from graph (Diameter, Radius, Extrema bounding, Centers with their degree, # Nodes, # Edges)
  • plot graph
  • plot graph with clusters (spectral clustering / Louvain Community)
  • contract nodes

Topic Analysis

  • train topic model
  • plot topic distribution
  • plot frequency lists (hashtags, word frequency)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytwanalysis-0.0.6.tar.gz (52.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pytwanalysis-0.0.6-py3-none-any.whl (54.7 kB view details)

Uploaded Python 3

File details

Details for the file pytwanalysis-0.0.6.tar.gz.

File metadata

  • Download URL: pytwanalysis-0.0.6.tar.gz
  • Upload date:
  • Size: 52.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for pytwanalysis-0.0.6.tar.gz
Algorithm Hash digest
SHA256 ae907407fe8cc111d8d842879e793221318f93ecfc17267e3beae0b1e0836bcd
MD5 564cccb8eb5b577112f63f01482a2f27
BLAKE2b-256 7dd436de3b7e8f777dcfbff3031d2481883e00f7286799273e6f25faa8119778

See more details on using hashes here.

File details

Details for the file pytwanalysis-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: pytwanalysis-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 54.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for pytwanalysis-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 d6a106e6443a22bf9c4e80e3d3c3d0c1164eea9b5f7a2724affdd9e47f0ab62e
MD5 23dd785f6568efdb921781fc2dc541e3
BLAKE2b-256 6352d89b6f5170f776622232afd5a60acbf22689f5cc5e07099ba2e55c4cf4f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page