Skip to main content

Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing, and saves the data.

Project description

Wayback Tweets

PyPI PyPI Downloads

Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see Field Options), and saves the data in HTML, for easy viewing of the tweets using the iframe tags, CSV, and JSON formats.

Installation

It is compatible with Python versions 3.10 and above. See installation options.

pipx install waybacktweets

CLI

Usage:
  waybacktweets [OPTIONS] USERNAME
  USERNAME: The Twitter username without @

Options:
  -c, --collapse [urlkey|digest|timestamp:xx]
                                  Collapse results based on a field, or a
                                  substring of a field. XX in the timestamp
                                  value ranges from 1 to 14, comparing the
                                  first XX digits of the timestamp field. It
                                  is recommended to use from 4 onwards, to
                                  compare at least by years.
  -f, --from DATE                 Filtering by date range from this date.
                                  Format: YYYYmmdd
  -t, --to DATE                   Filtering by date range up to this date.
                                  Format: YYYYmmdd
  -l, --limit INTEGER             Query result limits.
  -rk, --resumption_key TEXT      Allows for a simple way to scroll through
                                  the results. Key to continue the query from
                                  the end of the previous query.
  -mt, --matchtype [exact|prefix|host|domain]
                                  Results matching a certain prefix, a certain
                                  host or all subdomains.
  -v, --verbose                   Shows the log.
  --version                       Show the version and exit.
  -h, --help                      Show this message and exit.

Examples:
  waybacktweets jack
  waybacktweets --from 20200305 --to 20231231 --limit 300 --verbose jack

Repository:
  https://github.com/claromes/waybacktweets

Documentation:
  https://waybacktweets.claromes.com

Module

Open In Collab

from waybacktweets import WaybackTweets, TweetsParser, TweetsExporter

USERNAME = "jack"

api = WaybackTweets(USERNAME)
archived_tweets = api.get()

if archived_tweets:
    field_options = [
        "archived_urlkey",
        "archived_timestamp",
        "parsed_archived_timestamp",
        "archived_tweet_url",
        "parsed_archived_tweet_url",
        "original_tweet_url",
        "parsed_tweet_url",
        "available_tweet_text",
        "available_tweet_is_RT",
        "available_tweet_info",
        "archived_mimetype",
        "archived_statuscode",
        "archived_digest",
        "archived_length",
        "resumption_key",
    ]

    parser = TweetsParser(archived_tweets, USERNAME, field_options)
    parsed_tweets = parser.parse()

    exporter = TweetsExporter(parsed_tweets, USERNAME, field_options)
    exporter.save_to_csv()
    exporter.save_to_json()
    exporter.save_to_html()

Web App

Streamlit App

A prototype written in Python with the Streamlit framework and hosted on Streamlit Cloud.

Important: Starting from version 1.0, the web app will no longer receive all updates from the official package. To access all features, prefer using the package from PyPI.

Documentation

Acknowledgements

  • Tristan Lee (Bellingcat's Data Scientist) for the idea.
  • Jessica Smith (Snowflake's Community Growth Specialist) and Streamlit team for the additional server resources on Streamlit Cloud.
  • OSINT Community for recommending the package and the application.

License

GPL-3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

waybacktweets-1.0.tar.gz (29.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

waybacktweets-1.0-py3-none-any.whl (32.4 kB view details)

Uploaded Python 3

File details

Details for the file waybacktweets-1.0.tar.gz.

File metadata

  • Download URL: waybacktweets-1.0.tar.gz
  • Upload date:
  • Size: 29.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/5.15.167.4-microsoft-standard-WSL2

File hashes

Hashes for waybacktweets-1.0.tar.gz
Algorithm Hash digest
SHA256 2687324bab89822f698fb3425a5f02ab30b02443d4e98e4e9817174e49c85b6f
MD5 b1c5b2300db2499dc95b02de74f08813
BLAKE2b-256 c3a12370f0934fe9c818ef093ff7a32abfa5771d3c96d4b75dfe67ef15ae38bd

See more details on using hashes here.

File details

Details for the file waybacktweets-1.0-py3-none-any.whl.

File metadata

  • Download URL: waybacktweets-1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/5.15.167.4-microsoft-standard-WSL2

File hashes

Hashes for waybacktweets-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1b3e01b46921f9a803f7ef8e4379ccb49b53cff22a8d6a42e881ad99f915466d
MD5 df918b36c14a2a49a3ec28c425ea677c
BLAKE2b-256 710860f15f420429f2888ae7a802f39b04ee8ce96ea7dca04d747eef504719c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page