Singer tap for extracting data from Substack newsletters, built with the Meltano Singer SDK
Project description
tap-substack
A Singer tap for extracting data from Substack newsletters, built with the Meltano Singer SDK.
What's Working
Streams
| Stream | Endpoint | Auth | Status | Description |
|---|---|---|---|---|
posts |
/api/v1/archive |
No | Working | All published posts (paginated, sorted by date) |
post_details |
/api/v1/posts/{slug} |
No | Working | Full HTML content per post (child of posts) |
comments |
/api/v1/post/{id}/comments |
No | Working | Comments per post (child of posts) |
dashboard_summary |
/api/v1/publish-dashboard/summary |
Yes | Working | High-level publication metrics (subscribers, views, open rate) |
email_stats |
/api/v1/publication/stats/email_stats |
Yes | Working | Per-post email delivery and engagement metrics |
recommendations_inbound |
/api/v1/recommendations/stats/to |
Yes | Working | Publications recommending you + signup stats |
recommendations_outbound |
/api/v1/recommendations/stats/from |
Yes | Working | Publications you recommend + signup stats |
subscribers |
/api/v1/publication/subscribers |
Yes | Working | Subscriber list with email, type, source |
Features
- Public + authenticated modes: Runs without auth for public data; add
session_tokento unlock dashboard/analytics streams - Pagination: Offset-based pagination for list endpoints (
posts,email_stats,subscribers) - Rate limiting: Built-in 0.5s request throttle to avoid 429s
- Graceful auth errors: Authenticated streams log warnings and skip on 403/404/429 instead of crashing
- Parent-child streams:
post_detailsandcommentsautomatically iterate over posts from thepostsstream - Incremental replication:
postsstream supportsreplication_keyonpost_date
Installation
pip install -e .
Configuration
| Setting | Required | Description |
|---|---|---|
subdomain |
Yes | Your Substack subdomain (e.g. newsletter for newsletter.substack.com) |
session_token |
Yes | Your substack.sid cookie (see below) |
Getting your session token
- Log in to your Substack dashboard
- Open browser DevTools > Application > Cookies
- Copy the value of the
substack.sidcookie
Usage
# Discovery mode
tap-substack --config sample_config.json --discover
# Sync public data only
tap-substack --config sample_config.json
# Pipe to a target
tap-substack --config config.json | target-jsonl
Development
# Set up
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
# Run against a test newsletter (public only)
tap-substack --config test_config.json
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tap_substack-1.0.0.tar.gz
(7.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tap_substack-1.0.0.tar.gz.
File metadata
- Download URL: tap_substack-1.0.0.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70098d9c035e1ad6fc4f7fc6647e7bb26b0d4048e33d24404d4bfa592b96e0f2
|
|
| MD5 |
2377e5ebfa23b98961b462ba46b0831a
|
|
| BLAKE2b-256 |
8428fdfa6ad0e326b9f3d6b793caa92a7b2c9bf1a7c2f1c409f32820ddbef8d7
|
File details
Details for the file tap_substack-1.0.0-py3-none-any.whl.
File metadata
- Download URL: tap_substack-1.0.0-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f31043e92b7f06d8d054d28e63e2cf31d08c4a7c2243d17403d51b4d005100cf
|
|
| MD5 |
6743529cc8f61d7284261ab0be9d68d1
|
|
| BLAKE2b-256 |
9783f1bfbf737be024590cff8119688686a7646cc7bd6af3315ffa5b9534188e
|