Download posts and user metadata from the microblogging service Twitter
Project description
Download posts and user metadata from the microblogging service Twitter
Twitterhistory is in BETA status. Using it in production might be a bad idea. If you encounter any issues, please report them and/or submit a merge request with a fix.
This is a Python module to download posts and user metadata from the microblogging service Twitter using its API’s as of 2021 latest version 2. Data are saved to an SQLAlchemy/GeoAlchemy2-compatible database (currently only PostgreSQL/PostGIS is fully supported, SQLite with some limitations, see the documention of GeoAlchemy2).
The script will download all photos up until the current time, and keep track of already downloaded time periods in a cache file (default location ~/.cache/twitterhistory.yml
). When started the next time, it will attempt to fill gaps in the downloaded data and catch up until the then current time.
To use twitterhistory your API keys (see further down) need to be associated to an account with academic research access.
If you use twitterhistory for academic research, please cite it in your publication:
Fink, C. (2021): twitterhistory: a Python tool to download historical Twitter data. doi:10.5281/zenodo.4471196
Dependencies
The script is written in Python 3 and depends on the Python modules blessed, GeoAlchemy2, psycopg2, PyYaml, Requests and SQLAlchemy.
Installation
- Download the latest release, and use
pip
to install twitterhistory and its dependencies:
pip install twitterhistory-0.0.0.tar.gz
Configuration
Copy the example configuration file twitterhistory.yml.example to a suitable location, depending on your operating system:
- on Linux systems:
- system-wide configuration:
/etc/twitterhistory.yml
- per-user configuration:
~/.config/twitterhistory.yml
OR${XDG_CONFIG_HOME}/twitterhistory.yml
- system-wide configuration:
- on MacOS systems:
- per-user configuration:
${XDG_CONFIG_HOME}/twitterhistory.yml
- per-user configuration:
- on Microsoft Windows systems:
- per-user configuration:
%APPDATA%\twitterhistory.yml
- per-user configuration:
Adapt the configuration:
- Configure a database connection string (
connection_string
), pointing to an existing database (with the PostGIS extension enabled). - Configure an API OAuth 2.0 Bearer token with access to the Twitter API v2
twitter_oauth2_bearer_token
). - Configure one or more search terms for the query (
search_terms
).
If you have a cache file from a previous installation in which already downloaded time periods are saved, copy it to ${XDG_CACHE_HOME}/twitterhistory.yml
or %LOCALAPPDATA%/twitterhistory.yml
on Linux or MacOS, and Microsoft Windows, respectively.
Usage
Command line executable
python -m twitterhistory
Python
Import the twitterhistory
module. Instantiate a TwitterDownloader
, and call its download()
method.
import twitterhistory
downloader = twitterhistory.TwitterDownloader()
downloader.download()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for twitterhistory-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff09bbcd2ca9b04b0726440622ec130067c9a30dc2cf16da95f3ab2a06014037 |
|
MD5 | 160e71afba3b2dbfca6f66ee3e232cf4 |
|
BLAKE2b-256 | 250173989a462ee450465d23599220552c06ca78024a9c1b553c52291f0166d4 |