Skip to main content

BlueSky bookmarks ingestion toolkit: fetch, hydrate (article text, self-thread context, images), and merge into a JSON inventory.

Project description

bsky-saves

A toolkit for ingesting your own BlueSky bookmarks ("saves") into a portable JSON inventory, with optional hydration of linked article text, self-thread context, and CDN image downloads.

Since v0.5.0 the package also ships the bsky-saves-gui static web app inside the wheel; bsky-saves serve --gui mounts it at http://127.0.0.1:47826/ so a pipx install bsky-saves user can open a browser-based UI without provisioning anything else.

Why

The BlueSky web client lets you bookmark posts, but the saves are siloed inside the app. This tool pulls them out into a single JSON file you can read, archive, mirror, or build on top of.

It works for accounts hosted on bsky.social and on third-party AT Protocol PDSes (e.g. eurosky.social), because the bookmark fetch goes PDS-direct rather than through the AppView.

Install

pip install bsky-saves

Authenticate

Set two env vars from a BlueSky app password:

export BSKY_HANDLE=alice.bsky.social
export BSKY_APP_PASSWORD=xxxx-xxxx-xxxx-xxxx
# Required only for accounts hosted on a third-party PDS:
export BSKY_PDS=https://eurosky.social

The default BSKY_PDS is https://bsky.social.

Use

# Pull all bookmarks → ./saves_inventory.json
bsky-saves fetch --inventory ./saves_inventory.json

# Hydrate every external-link bookmark with the linked article's text.
bsky-saves hydrate articles --inventory ./saves_inventory.json

# Hydrate every bookmark with same-author self-thread descendants.
bsky-saves hydrate threads --inventory ./saves_inventory.json

# Decode each save's post-creation timestamp from its rkey (offline).
bsky-saves enrich --inventory ./saves_inventory.json

# Download cdn.bsky.app images referenced by the inventory into ./images/
# (flat layout). Records url→path mappings as `local_images` on each entry.
# Use --uris FILE (newline-delimited at:// URIs) to limit to a subset.
bsky-saves hydrate images --inventory ./saves_inventory.json --out ./images

# Run a local HTTP helper daemon for bsky-saves-gui (CORS bridge).
# Binds 127.0.0.1:47826; pass --allow-origin for self-hosted GUI deployments.
bsky-saves serve

# Same daemon, plus serve the bundled GUI itself at http://127.0.0.1:47826/.
bsky-saves serve --gui

All commands are idempotent: running them again skips already-hydrated entries and adds only what's new. Failures are recorded inline (e.g. article_fetch_error) so subsequent runs don't pointlessly re-hit them.

bsky-saves serve

bsky-saves serve runs a small HTTP helper daemon on 127.0.0.1 that bsky-saves-gui — a static web app running bsky-saves in Pyodide — calls to offload operations the browser can't do directly: fetching image bytes and arbitrary article URLs (both blocked by CORS), and routing bookmark enumeration, enrichment, and thread hydration through the helper instead of running them in Pyodide.

bsky-saves serve [--gui] [--port 47826] [--allow-origin ORIGIN]... [--verbose]

The daemon binds only to 127.0.0.1, writes nothing to disk, reads no config files, validates the Host header to reject DNS-rebinding attempts (421), enforces an Origin allowlist (403 for anything outside the defaults), caps request bodies at 10 MB, and exposes six endpoints:

Endpoint Credentials Purpose
GET /ping Health check; advertises supported endpoints in a features array
POST /fetch-image Download a cdn.bsky.app image; returns the bytes
POST /extract-article Fetch + trafilatura-extract text from an article URL
POST /fetch required Paginated bookmark enumeration with opaque cursor
POST /enrich Decode post_created_at offline from at-URI rkeys
POST /hydrate-threads required Concurrent same-author thread reply hydration

Endpoints that require credentials accept {handle, app_password, pds?} in the request body; the daemon does its own createSession per request and never persists anything. pds defaults to https://bsky.social when absent. /hydrate-threads validates credentials (to fail-fast on a bad app password) but reads threads from the public AppView unauthenticated.

The default Origin allowlist is http://127.0.0.1:<port>, http://localhost:<port>, and https://saves.lightseed.net. Pass --allow-origin <url> (repeatable) to add to this list — for example if you self-host the GUI at a custom URL. The flag is additive, not replacing.

--gui mode

Pass --gui to also mount the bundled bsky-saves-gui static bundle at /. The GUI shares the same loopback port that serves the JSON API; API routes always take precedence over static files. Missing non-API paths fall back to the GUI's index.html so its SPA router takes over.

--gui is opt-in. Without it, the daemon behaves as a JSON-only CORS bridge for the hosted GUI at https://saves.lightseed.net (the v0.4.x behaviour). With it, you don't need a hosted GUI deployment at all — open http://127.0.0.1:47826/ directly. The wheel bundles a known-version GUI pinned at build time via SHA-256; bumping the pin requires a coordinated release with the bsky-saves-gui repo.

If --gui is passed but the bundled GUI is missing (e.g. a broken install or an sdist build that didn't run the vendor hook), the daemon exits with code 2 and a clear error.

The full HTTP API contracts live in the consumer repo:

Inventory schema

{
  "fetched_at": "2026-04-30T14:00:00Z",
  "saves": [
    {
      "uri": "at://did:plc:.../app.bsky.feed.post/abc123",
      "saved_at": "2026-04-29T22:11:00Z",
      "post_created_at": "2026-04-29T17:43:51Z",  // decoded from rkey
      "post_text": "...",
      "embed": {
        "type": "external",
        "url": "https://example.org/article",
        "title": "...",
        "description": "..."
      },
      "author": { "handle": "...", "display_name": "...", "did": "..." },
      "images": [
        { "kind": "image", "url": "https://cdn.bsky.app/...", "alt": "..." }
      ],
      "quoted_post": { /* optional, when the save quote-posts another post */ },

      // Added by `hydrate articles`:
      "article_text": "...",
      "article_published_at": "2025-09-13",
      "article_fetched_at": "...",

      // Added by `hydrate threads`:
      "thread_replies": [
        { "uri": "...", "indexedAt": "...", "text": "...", "images": [...] }
      ],
      "thread_schema_version": 4,
      "thread_fetched_at": "...",

      // Added by `hydrate images`:
      "local_images": [
        { "url": "https://cdn.bsky.app/...", "path": "img-9f2c8e1b....jpg" }
      ]
    }
  ]
}

What about OAuth?

bsky-saves only supports the app-password authentication path. The OAuth + DPoP machinery for third-party PDSes lives in a separate package, atproto-oauth-py, and exists primarily for AppView-targeted resource calls that aren't reachable via PDS-direct auth. For BlueSky bookmarks the PDS-direct path (which bsky-saves uses) works regardless of where your account is hosted.

License

MIT. See LICENSE.

Provenance

Extracted from https://github.com/tenorune/tenorune.github.io's scripts/ directory, where it powered the Stories of 47 archive's BlueSky save ingestion. The Jekyll site itself stays in that repo; this is the reusable ingestion layer.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bsky_saves-0.5.1.tar.gz (483.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bsky_saves-0.5.1-py3-none-any.whl (333.0 kB view details)

Uploaded Python 3

File details

Details for the file bsky_saves-0.5.1.tar.gz.

File metadata

  • Download URL: bsky_saves-0.5.1.tar.gz
  • Upload date:
  • Size: 483.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bsky_saves-0.5.1.tar.gz
Algorithm Hash digest
SHA256 82041c20c970749ef044cf56c8bf47a1dcdf9b91609f4675826773a967ba3b33
MD5 a9bfc18b0ffd6a16114a9cca15011c37
BLAKE2b-256 fb79936b2b5cc144fe9962b6ceba9125bd6c6735452c695567953cf7e775038f

See more details on using hashes here.

Provenance

The following attestation bundles were made for bsky_saves-0.5.1.tar.gz:

Publisher: release.yml on tenorune/bsky-saves

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bsky_saves-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: bsky_saves-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 333.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bsky_saves-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7135afcfd44607518780fa0a8310f82b8b408892b0363c8d63ff888ce927ef64
MD5 2ad270a92dfddf59cc636af16882f4a0
BLAKE2b-256 aca8b91778d2b08b070ad1e40d1652ac526d7d061282bdc05b82cb7ba234eb09

See more details on using hashes here.

Provenance

The following attestation bundles were made for bsky_saves-0.5.1-py3-none-any.whl:

Publisher: release.yml on tenorune/bsky-saves

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page