Skip to main content

Social media archival with authenticity guarantees

Project description

Pusheen Archiver

Pusheen Archiver

Save social media posts before they disappear. Pusheen Archiver captures posts, profiles, and media from X, YouTube, TikTok, Instagram, SoundCloud, and Pinterest — all stored locally, with cryptographic hashes so you can prove the content is untampered.

No cloud account. No subscription. Just pip install pusheen-archiver and you're done.


Install

pip install pusheen-archiver
pusheen

That's it. The first time you run pusheen with no arguments it'll ask where you want to store your archives and set everything up.

Windows installer: Grab PusheenInstaller.exe from the releases page for a GUI installer that handles Python, PATH, and Chromium automatically.


Quick start

# archive a whole profile
pusheen save https://x.com/someuser
pusheen save https://www.tiktok.com/@someuser
pusheen save https://soundcloud.com/some_artist

# single post
pusheen save https://www.youtube.com/watch?v=dQw4w9WgXcQ

# keep it up to date
pusheen sync x someuser

Paste any supported URL and pusheen figures out what it is — profile, post, playlist, whatever.


What gets saved

For each post:

  • The media (video, images, audio) at the best available quality
  • A metadata.json with the caption, stats, hashtags, and everything else
  • A full-page screenshot and rendered HTML snapshot (via Playwright)
  • A versions/ folder that records every edit the post goes through

For each profile run:

  • Avatar and banner images
  • A signed manifest.json listing every file with its SHA256 hash
  • A receipt.txt you can attach to a legal filing

Nothing ever gets deleted from disk. If a post disappears online, it gets flagged in the database but stays in your archive.


Platforms

Platform Auth needed? Notes
X (Twitter) No API key — browser cookies work See cookie setup below
YouTube Optional API key Works fine without one
TikTok Optional Public profiles work without credentials
Instagram Optional Public profiles work without credentials
SoundCloud None client_id is auto-discovered
Pinterest Optional Public boards work without credentials

Configuration

All settings are in a single TOML file — no scattered environment variables for desktop use:

OS Location
Windows %APPDATA%\pusheen-archiver\config.toml
macOS ~/Library/Application Support/pusheen-archiver/config.toml
Linux ~/.config/pusheen-archiver/config.toml
pusheen config edit   # opens it in your default editor

The file is fully commented so you know what everything does. The important bits:

[paths]
archive_root = "C:/Users/you/archive"

[archive]
capture_screenshots  = true
capture_html         = true
skip_media           = false   # true = metadata only, no downloads
save_info_json       = true    # yt-dlp .info.json sidecar files
save_thumbnail       = true    # thumbnail images alongside media
max_posts            = 0       # 0 = no limit

[media]
media_format  = "default"      # default | mp4 | webm | mp3 | m4a | flac | opus
media_quality = "best"         # best | high | medium | low | worst

You can also pass --no-info-json or --no-thumbnail on the command line to skip those for a single run without touching the config.


Cookie auth for X

X doesn't require an API key. Browser cookies are enough.

Option A — cookies file (more reliable)

  1. Install the Get cookies.txt LOCALLY extension
  2. Log into x.com, click the extension, export as cookies.txt
  3. In config.toml under [x]: cookies_file = "C:/path/to/cookies.txt"

Option B — live browser (easier)

[x]
cookies_browser = "brave"   # chrome | firefox | edge | brave | chromium

The browser has to be closed when you run pusheen — Chrome and Brave lock their cookie database while they're open.


Archive structure

archive/
  x/
    someuser/
      profile/
        profile.json
        avatar.jpg
        banner.jpg
      posts/
        2026-06-10_1234567890/
          metadata.json
          screenshot.png
          page.html
          media/
            video.mp4
          versions/
            v1.json
            v2.json        ← created automatically when a post is edited
      manifests/
        manifest.json      ← every file + its SHA256
        manifest.sig       ← Ed25519 signature (if you've run `pusheen keygen`)
        receipt.txt

Verifying an archive

pusheen verify archive/x/someuser/manifests

Checks every file hash against the manifest. If you generated signing keys (pusheen keygen), it validates the Ed25519 signature too.


All commands

pusheen save <url>               archive anything — post, profile, playlist
  --no-media                     skip downloads, save metadata only
  --no-screenshots               skip Playwright screenshots
  --no-info-json                 skip yt-dlp .info.json sidecars
  --no-thumbnail                 skip thumbnail images
  --out <dir>                    save to a specific directory
  --watch                        re-archive a profile on a schedule

pusheen sync <platform> <user>   incremental sync (new posts only)
pusheen sync-all                 sync every account you've archived
pusheen daemon                   run sync-all on repeat until Ctrl-C

pusheen search <query>           search across all archived captions
pusheen history <platform> <user> show profile change timeline
pusheen export <platform> <user> pack to .zip or .tar.gz
pusheen status                   list archived accounts and stats
pusheen verify <manifest_dir>    check file hashes and signature
pusheen keygen                   generate Ed25519 signing keys

pusheen config edit              open config.toml in your editor
pusheen config show              print current settings
pusheen config update            add missing keys to an existing config
pusheen db init                  create database tables (first run)
pusheen db migrate               run Alembic migrations
pusheen install-browser          install Playwright browser
pusheen shell                    interactive REPL

Platform aliases: x/tw, yt, ig/insta, tt, sc, pin


Search

pusheen search "concert announcement"
pusheen search "cute" --platform tiktok
pusheen search "dropped" --username someuser --limit 50

Profile history

pusheen history x someuser
  2026-01-15  first seen     bio: "just a person"   followers: 1,204
  2026-03-02  bio changed    "just a person on the internet"   +185 followers
  2026-06-10  avatar changed   +113 followers

Server mode

The default setup uses SQLite and runs entirely locally. If you want to run pusheen as a shared service with a REST API and async job queue:

pip install "pusheen-archiver[server]"
# set database_url in config.toml to your PostgreSQL connection string
docker-compose up -d db redis
pusheen db migrate
uvicorn pusheen_archiver.api.main:app --host 0.0.0.0 --port 8000

API docs at http://localhost:8000/docs. Requires PostgreSQL + Redis. This is for self-hosted or developer deployments — not needed for personal use.


Adding a platform

  1. Create src/pusheen_archiver/adapters/myplatform.py, subclass BasePlatformAdapter
  2. Implement discover_account, discover_posts, fetch_metadata, download_media
  3. Register it in src/pusheen_archiver/adapters/__init__.py

Everything else — signing, manifests, version history, deduplication, CLI — works automatically.


Development

git clone https://github.com/pusheenism/pusheen-archiver
cd pusheen-archiver
pip install -e ".[dev]"
pytest

For a deep dive into the internals — database schema, adapter interface, API endpoints, signing system — see docs/ARCHITECTURE.md.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pusheen_archiver-0.1.2.tar.gz (111.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pusheen_archiver-0.1.2-py3-none-any.whl (127.1 kB view details)

Uploaded Python 3

File details

Details for the file pusheen_archiver-0.1.2.tar.gz.

File metadata

  • Download URL: pusheen_archiver-0.1.2.tar.gz
  • Upload date:
  • Size: 111.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.6

File hashes

Hashes for pusheen_archiver-0.1.2.tar.gz
Algorithm Hash digest
SHA256 8de99ec3cf680a73fad053f2d775e6cb4c60aca5e950470b2618ffa17ad3f088
MD5 272086cd67a568939484038f477a40fb
BLAKE2b-256 162391772f07d90855608b63ee0168b506a1c1adb19d9accffb255559fa880e2

See more details on using hashes here.

File details

Details for the file pusheen_archiver-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for pusheen_archiver-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 679971ae7c70735afc828cb6649b13e942ef71f6d242ff5ab5b47c9c2e3d6549
MD5 7db9e1239c1125db850d0ae2ea7b933a
BLAKE2b-256 207eb8be12c172a68b660e1d1e25e6f2e4f31d1ae078ed27fc69a307777428b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page