Skip to main content

Script to store tweets of a list of users in a databases for NLP processing.

Project description

Tweet Archiveur

This project aim at storing tweets in a database. But you could use it without database.

  • Input : tweetos id in a CSV file
  • Output : A databases of tweets and hastags

The goal for us is to store tweets of all members of the French Parliament to get an idea of the trendings topics.

But you could use the project for other purpose with other people.

How to install the package

TODO : push it to Pipy when :

pip install tweetarchiveur

How to use the package in your project

There is two class :

  • A Scrapper() to use the Twitter API
  • A Database() to store tweets and hastags in it
from tweet_archiveur.scrapper import Scrapper
from tweet_archiveur.database import Database

# Force some variable outside Docker
from os import environ
environ["DATABASE_PORT"] = '8479'
environ["DATABASE_HOST"] = 'localhost'
environ["DATABASE_USER"] = 'tweet_archiveur_user'
environ["DATABASE_PASS"] = '1234leximpact'
environ["DATABASE_NAME"] = 'tweet_archiveur'

scrapper = Scrapper()
df_users = scrapper.get_users_accounts('../tests/sample-users.csv')
users_id = df_users.twitter_id.tolist()
database = Database()
database.create_tables_if_not_exist()
database.insert_twitter_users(df_users)
scrapper.get_all_tweet_and_store_them(database, users_id[0:2])
del database
del scrapper
2021-03-22 10:21:59,837 -  tweet-archiveur INFO     Scrapper ready
2021-03-22 10:21:59,841 -  tweet-archiveur INFO     Loading database module...
2021-03-22 10:21:59,842 -  tweet-archiveur DEBUG    DEBUG : connect(user=tweet_archiveur_user, password=XXXX, host=localhost, port=8479, database=tweet_archiveur, url=None)
2021-03-22 10:22:03,915 -  tweet-archiveur INFO     Done scrapping, we got 400 tweets from 2 tweetos.

How we use it

We get the tweets of the 577 French Parliament member's every 8 hours and store them in a PostgreSQL database.

We then explore them with Apache Superset.

How we deploy it

Prepare the environment :

git clone https://github.com/leximpact/tweet-archiveur.git
cd tweet-archiveur
cp docker/docker.env .env

Edit the .env to your needs.

Run the application :

docker-compose up -d

To view what's going on :

docker logs tweet-archiveur_tweet_archiveur_1 -f

The script archiveur.py use the package to get the parliament accounts from https://github.com/regardscitoyens/twitter-parlementaires

The parameters is read in a .env file.

It is launched by the entrypoint.sh script every 8 hours.

To stop it :

docker-compose down

The data is kept in a docker volume, to clean them :

docker-compose down -v

What to do with it ?

  • Most used hashtag (per period, per person)
  • Most/Less active user
  • Timeline of
  • NLP Topic detection
  • Word cloud

Annexes

Exit code :

  • 1 : Unknown error when storing tweets
  • 2 : Unknown error getting tweets
  • 3 : Failed more than 3 consecutive times
  • 4 : no env

If one thing fail no tweet will be saved.

status code = 429 : 429 'Too many requests' error is returned when you exceed the maximum number of requests allowed

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tweet_archiveur-0.0.1.tar.gz (16.2 kB view hashes)

Uploaded Source

Built Distribution

tweet_archiveur-0.0.1-py3-none-any.whl (13.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page