Skip to main content

Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing, and saves the data.

Project description

Wayback Tweets

PyPI PyPI Downloads

Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see Field Options), and saves the data in HTML, for easy viewing of the tweets using the iframe tags, CSV, and JSON formats.

Installation

It is compatible with Python versions 3.10 and above. See installation options.

pipx install waybacktweets

CLI

Usage:
  waybacktweets [OPTIONS] USERNAME
  USERNAME: The Twitter username without @

Options:
  -c, --collapse [urlkey|digest|timestamp:xx]
                                  Collapse results based on a field, or a
                                  substring of a field. XX in the timestamp
                                  value ranges from 1 to 14, comparing the
                                  first XX digits of the timestamp field. It
                                  is recommended to use from 4 onwards, to
                                  compare at least by years.
  -f, --from DATE                 Filtering by date range from this date.
                                  Format: YYYYmmdd
  -t, --to DATE                   Filtering by date range up to this date.
                                  Format: YYYYmmdd
  -l, --limit INTEGER             Query result limits.
  -rk, --resumption_key TEXT      Allows for a simple way to scroll through
                                  the results. Key to continue the query from
                                  the end of the previous query.
  -mt, --matchtype [exact|prefix|host|domain]
                                  Results matching a certain prefix, a certain
                                  host or all subdomains.
  -v, --verbose                   Shows the log.
  --version                       Show the version and exit.
  -h, --help                      Show this message and exit.

Examples:
  waybacktweets jack
  waybacktweets --from 20200305 --to 20231231 --limit 300 --verbose jack

Repository:
  https://github.com/claromes/waybacktweets

Documentation:
  https://waybacktweets.claromes.com

Module

Open In Collab

from waybacktweets import WaybackTweets, TweetsParser, TweetsExporter

USERNAME = "jack"

api = WaybackTweets(USERNAME)
archived_tweets = api.get()

if archived_tweets:
    field_options = [
        "archived_urlkey",
        "archived_timestamp",
        "parsed_archived_timestamp",
        "archived_tweet_url",
        "parsed_archived_tweet_url",
        "original_tweet_url",
        "parsed_tweet_url",
        "available_tweet_text",
        "available_tweet_is_RT",
        "available_tweet_info",
        "archived_mimetype",
        "archived_statuscode",
        "archived_digest",
        "archived_length",
        "resumption_key",
    ]

    parser = TweetsParser(archived_tweets, USERNAME, field_options)
    parsed_tweets = parser.parse()

    exporter = TweetsExporter(parsed_tweets, USERNAME, field_options)
    exporter.save_to_csv()
    exporter.save_to_json()
    exporter.save_to_html()

Web App

Streamlit App

A prototype written in Python with the Streamlit framework and hosted on Streamlit Cloud.

Important: Starting from version 1.0, the web app will no longer receive all updates from the official package. To access all features, prefer using the package from PyPI.

Documentation

Acknowledgements

  • Tristan Lee (Bellingcat's Data Scientist) for the idea.
  • Jessica Smith (Snowflake's Community Growth Specialist) and Streamlit team for the additional server resources on Streamlit Cloud.
  • OSINT Community for recommending the package and the application.

License

GPL-3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

waybacktweets-1.0rc1.tar.gz (29.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

waybacktweets-1.0rc1-py3-none-any.whl (32.4 kB view details)

Uploaded Python 3

File details

Details for the file waybacktweets-1.0rc1.tar.gz.

File metadata

  • Download URL: waybacktweets-1.0rc1.tar.gz
  • Upload date:
  • Size: 29.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/5.15.167.4-microsoft-standard-WSL2

File hashes

Hashes for waybacktweets-1.0rc1.tar.gz
Algorithm Hash digest
SHA256 6a12256044a548dc50492dde9f21aff7203b1217812887249e2dc0c58a1579f0
MD5 d2dfa7faef511d54a65802c95279877f
BLAKE2b-256 31e9064f525e9e4953d421bb73843275bab48ccb756173ab438b7af5d09dbbf2

See more details on using hashes here.

File details

Details for the file waybacktweets-1.0rc1-py3-none-any.whl.

File metadata

  • Download URL: waybacktweets-1.0rc1-py3-none-any.whl
  • Upload date:
  • Size: 32.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/5.15.167.4-microsoft-standard-WSL2

File hashes

Hashes for waybacktweets-1.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 ea67c7833dc335e9de98d211269034cbfa8fb302de873d2871cfcda88ef70101
MD5 6f3d761713475e5c2fd41253ab599e0f
BLAKE2b-256 51ce8fb2f99b44a28096ac17c3746e348fb4a3902adf9a97e0cb83241fe784ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page