Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing, and saves the data.
Project description
Wayback Tweets
Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see Field Options), and saves the data in HTML, for easy viewing of the tweets using the iframe tags, CSV, and JSON formats.
Installation
It is compatible with Python versions 3.10 and above. See installation options.
pipx install waybacktweets
CLI
Usage:
waybacktweets [OPTIONS] USERNAME
USERNAME: The Twitter username without @
Options:
-c, --collapse [urlkey|digest|timestamp:xx]
Collapse results based on a field, or a
substring of a field. XX in the timestamp
value ranges from 1 to 14, comparing the
first XX digits of the timestamp field. It
is recommended to use from 4 onwards, to
compare at least by years.
-f, --from DATE Filtering by date range from this date.
Format: YYYYmmdd
-t, --to DATE Filtering by date range up to this date.
Format: YYYYmmdd
-l, --limit INTEGER Query result limits.
-rk, --resumption_key TEXT Allows for a simple way to scroll through
the results. Key to continue the query from
the end of the previous query.
-mt, --matchtype [exact|prefix|host|domain]
Results matching a certain prefix, a certain
host or all subdomains.
-v, --verbose Shows the log.
--version Show the version and exit.
-h, --help Show this message and exit.
Examples:
waybacktweets jack
waybacktweets --from 20200305 --to 20231231 --limit 300 --verbose jack
Repository:
https://github.com/claromes/waybacktweets
Documentation:
https://waybacktweets.claromes.com
Module
from waybacktweets import WaybackTweets, TweetsParser, TweetsExporter
USERNAME = "jack"
api = WaybackTweets(USERNAME)
archived_tweets = api.get()
if archived_tweets:
field_options = [
"archived_urlkey",
"archived_timestamp",
"parsed_archived_timestamp",
"archived_tweet_url",
"parsed_archived_tweet_url",
"original_tweet_url",
"parsed_tweet_url",
"available_tweet_text",
"available_tweet_is_RT",
"available_tweet_info",
"archived_mimetype",
"archived_statuscode",
"archived_digest",
"archived_length",
"resumption_key",
]
parser = TweetsParser(archived_tweets, USERNAME, field_options)
parsed_tweets = parser.parse()
exporter = TweetsExporter(parsed_tweets, USERNAME, field_options)
exporter.save_to_csv()
exporter.save_to_json()
exporter.save_to_html()
Web App
A prototype written in Python with the Streamlit framework and hosted on Streamlit Cloud.
Important: Starting from version 1.0, the web app will no longer receive all updates from the official package. To access all features, prefer using the package from PyPI.
Documentation
Acknowledgements
- Tristan Lee (Bellingcat's Data Scientist) for the idea.
- Jessica Smith (Snowflake's Community Growth Specialist) and Streamlit team for the additional server resources on Streamlit Cloud.
- OSINT Community for recommending the package and the application.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file waybacktweets-1.0.tar.gz.
File metadata
- Download URL: waybacktweets-1.0.tar.gz
- Upload date:
- Size: 29.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/5.15.167.4-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2687324bab89822f698fb3425a5f02ab30b02443d4e98e4e9817174e49c85b6f
|
|
| MD5 |
b1c5b2300db2499dc95b02de74f08813
|
|
| BLAKE2b-256 |
c3a12370f0934fe9c818ef093ff7a32abfa5771d3c96d4b75dfe67ef15ae38bd
|
File details
Details for the file waybacktweets-1.0-py3-none-any.whl.
File metadata
- Download URL: waybacktweets-1.0-py3-none-any.whl
- Upload date:
- Size: 32.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/5.15.167.4-microsoft-standard-WSL2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b3e01b46921f9a803f7ef8e4379ccb49b53cff22a8d6a42e881ad99f915466d
|
|
| MD5 |
df918b36c14a2a49a3ec28c425ea677c
|
|
| BLAKE2b-256 |
710860f15f420429f2888ae7a802f39b04ee8ce96ea7dca04d747eef504719c3
|