Download posts and user metadata from the microblogging service Twitter

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Download a history of posts and user metadata from the microblogging service Twitter

This is a Python module to download a complete history of posts and user metadata from the microblogging service Twitter using its API’s as of 2021 latest version 2. Data are saved to an SQLAlchemy/GeoAlchemy2-compatible database (currently only PostgreSQL/PostGIS is fully supported, see also the documention of GeoAlchemy2).

screen shot

The script will download all Twitter status messages up until the current time, and keep track of already downloaded time periods in a cache file (default location ~/.cache/twitterhistory.yml). When started the next time, it will attempt to fill gaps in the downloaded data and catch up until the then current time.

To use twitterhistory, your API key (see further down) needs to be associated to an account with academic research access.

If you use twitterhistory for academic research, please cite it in your publication:
Fink, C. (2021): twitterhistory: a Python tool to download historical Twitter data. doi:10.5281/zenodo.4471195

Dependencies

The script is written in Python 3 and depends on the Python modules blessed, GeoAlchemy2, psycopg2, PyYaml, Requests and SQLAlchemy.

Installation

pip install twitterhistory

Configuration

Copy the example configuration file twitterhistory.yml.example to a suitable location, depending on your operating system:

on Linux systems:
- system-wide configuration: /etc/twitterhistory.yml
- per-user configuration:
  - ~/.config/twitterhistory.yml OR
  - ${XDG_CONFIG_HOME}/twitterhistory.yml
on MacOS systems:
- per-user configuration:
  - ${XDG_CONFIG_HOME}/twitterhistory.yml
on Microsoft Windows systems:
- per-user configuration: %APPDATA%\twitterhistory.yml
in a custom file path location specified on the command line (see further down)

Adapt the configuration:

Configure a database connection string (connection_string), pointing to an existing database (with the PostGIS extension enabled).
Configure an API OAuth 2.0 Bearer token with access to the Twitter API v2 twitter_oauth2_bearer_token).
Configure one or more search terms for the query (search_terms).

If you have a cache file from a previous installation in which already downloaded time periods are saved, copy it to ${XDG_CACHE_HOME}/twitterhistory.yml or %LOCALAPPDATA%/twitterhistory.yml on Linux or MacOS, and Microsoft Windows, respectively.

The cache file is currently also the best way to limit the temporal range of the data collection (by default, twitterhistory downloads the entire history of Tweets that correspond to the search terms). Run twitterhistory at least briefly for it to create an initial cache file. In this file, it marks the time spans for which it successfully downloaded data, per search_term. Add one or more !TimeSpan objects that cover all periods between March 2006 and the current date except the temporal range you want to download - twitterhistory will then try to fill this gap, only.

Usage

Command line executable

python -m twitterhistory

python -m twitterhistory --config /path/to/custom/config-file.yml

Python

Import the twitterhistory module. Instantiate a TwitterHistoryDownloader, and call its download() method.

import twitterhistory

downloader = twitterhistory.TwitterHistoryDownloader()
downloader.download()

Data privacy

By default, twitterhistory pseudonymises downloaded metadata, i.e., it replaces (direct) identifiers with randomised identifiers (generated using hashes, i.e., one-way ‘encryption’). This serves as one step of a responsible data processing workflow. However, other (meta-)data might nevertheless qualify as indirect identifiers, as they, combined or on their own, might allow re-identification of a person. If you want to use data downloaded using twitterhistory in a GDPR-compliant fashion, you have to follow up the data collection stage with data minimisation and further pseudonymisation or anonymisation efforts.

twitterhistory can keep original identifiers (i.e., skip pseudonymisation). To instruct it to do so, instantiate a TwitterHistoryDownloader with the parameter pseudonymise_identifiers=False or set the according parameter in the configuration file. Ensure that you fulfil all legal and organisational requirements to handle personal information before you decide to collect non-pseudonyismed data.

import twitterhistory

downloader = twitterhistory.TwitterHistoryDownloader(
    pseudonymise_identifiers = False  # get legal advice and ethics approval before doing this
)
downloader.download()

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.4.0

Mar 2, 2022

0.3.7

Feb 3, 2022

0.3.6

Jul 1, 2021

0.3.5

Jun 28, 2021

0.3.4

Jun 21, 2021

0.3.3

Apr 28, 2021

0.3.2

Apr 12, 2021

0.3.1

Apr 9, 2021

0.3.0

Apr 9, 2021

0.2.4

Feb 10, 2021

0.2.3

Feb 2, 2021

0.2.2

Feb 2, 2021

0.2.1

Jan 27, 2021

0.2.0

Jan 27, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

twitterhistory-0.4.0.tar.gz (44.8 kB view details)

Uploaded Mar 2, 2022 Source

File details

Details for the file twitterhistory-0.4.0.tar.gz.

File metadata

Download URL: twitterhistory-0.4.0.tar.gz
Upload date: Mar 2, 2022
Size: 44.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.4.2 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.2

File hashes

Hashes for twitterhistory-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`c57164fee1916e6c1dc666cb930e2f8eab261c5c82792e4d32df76c24cd3578e`
MD5	`c7338682307cc7dc04fc37a737bc191a`
BLAKE2b-256	`c1b0c6ba6f3f9202b994dfb3a766f1cd5a8166430f65a9a083042311f498b469`

See more details on using hashes here.

twitterhistory 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Download a history of posts and user metadata from the microblogging service Twitter

Dependencies

Installation

Configuration

Usage

Command line executable

Python

Data privacy

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes