ATProto AppView for science.alt.dataset
Project description
atdata-app
An ATProto AppView for the science.alt.dataset lexicon namespace. It indexes dataset metadata published across the AT Protocol network and serves it through XRPC endpoints — enabling discovery, search, and resolution of datasets, schemas, labels, and lenses.
Overview
In the AT Protocol architecture, an AppView is a service that subscribes to the network firehose, indexes records it cares about, and exposes query endpoints for clients. atdata-app does this for scientific and ML dataset metadata:
- Schemas define the structure of datasets (JSON Schema, Arrow schema, etc.)
- Dataset entries describe a dataset — its name, storage location, schema, tags, license, and size
- Labels are human-readable version tags pointing to a specific dataset entry (like git tags)
- Lenses are bidirectional schema transforms with getter/putter code for migrating data between schema versions
ATProto Network
│
├── Jetstream (WebSocket firehose) ──► Real-time ingestion
│ │
└── BGS Relay (HTTP backfill) ──────► Historical backfill
│
▼
PostgreSQL
│
▼
XRPC Query Endpoints ──► Clients
Requirements
- Python 3.12+
- PostgreSQL 14+
- uv package manager
Quickstart
# Install dependencies
uv sync --dev
# Initialize the lexicon submodule
git submodule update --init
# Set up PostgreSQL (schema auto-applies on startup)
createdb atdata_app
# Start the server
uv run uvicorn atdata_app.main:app --reload
The server starts with dev-mode defaults: http://localhost:8000, DID did:web:localhost%3A8000. On startup it connects to Jetstream and begins indexing science.alt.dataset.* records, and runs a one-shot backfill of historical records from the BGS relay.
Configuration
All settings are environment variables prefixed with ATDATA_, managed by pydantic-settings.
| Variable | Default | Description |
|---|---|---|
ATDATA_HOSTNAME |
localhost |
Public hostname, used to derive did:web identity |
ATDATA_PORT |
8000 |
Server port (included in DID in dev mode) |
ATDATA_DEV_MODE |
true |
Dev mode uses http:// and includes port in DID; production uses https:// |
ATDATA_DATABASE_URL |
postgresql://localhost:5432/atdata_app |
PostgreSQL connection string |
ATDATA_JETSTREAM_URL |
wss://jetstream2.us-east.bsky.network/subscribe |
Jetstream WebSocket endpoint |
ATDATA_JETSTREAM_COLLECTIONS |
science.alt.dataset.* |
Collections to subscribe to |
ATDATA_RELAY_HOST |
https://bsky.network |
BGS relay for backfill DID discovery |
Identity
The service derives its did:web identity from the hostname and port:
- Dev mode:
did:web:localhost%3A8000with endpointhttp://localhost:8000 - Production:
did:web:datasets.example.comwith endpointhttps://datasets.example.com
The DID document is served at GET /.well-known/did.json and advertises the service as an AtprotoAppView.
API Reference
See docs/api-reference.md for the full XRPC endpoint reference (queries, procedures, and other routes).
Data Model
See docs/data-model.md for the database schema (schemas, entries, labels, lenses).
Docker Deployment
The app ships with a multi-stage Dockerfile using uv for fast dependency installation.
Build and run locally
docker build -t atdata-app .
docker run -p 8000:8000 \
-e ATDATA_DATABASE_URL=postgresql://user:pass@host:5432/atdata_app \
-e ATDATA_HOSTNAME=localhost \
-e ATDATA_DEV_MODE=true \
atdata-app
Deploy on Railway
The repo includes a railway.toml that configures the Dockerfile builder, health checks at /health, and a restart-on-failure policy.
- Connect the repo to a Railway project
- Add a PostgreSQL service and link it
- Set the required environment variables:
| Variable | Value |
|---|---|
ATDATA_DATABASE_URL |
Provided by Railway's PostgreSQL plugin (${{Postgres.DATABASE_URL}}) |
ATDATA_HOSTNAME |
Your Railway public domain (e.g. atdata-app-production.up.railway.app) |
ATDATA_DEV_MODE |
false |
ATDATA_PORT |
Omit — Railway sets PORT automatically and the container respects it |
Optional variables for ingestion tuning:
| Variable | Default | Description |
|---|---|---|
ATDATA_JETSTREAM_URL |
wss://jetstream2.us-east.bsky.network/subscribe |
Jetstream endpoint |
ATDATA_RELAY_HOST |
https://bsky.network |
BGS relay for backfill |
Railway will auto-deploy on push, build the Docker image, and start the container.
Development
# Run tests (no database required)
uv run pytest
# Run a single test
uv run pytest tests/test_models.py::test_parse_at_uri -v
# Run with coverage
uv run pytest --cov=atdata_app
# Lint
uv run ruff check src/ tests/
Tests mock all external dependencies (database, HTTP, identity resolution) using unittest.mock.AsyncMock. HTTP endpoint tests use httpx ASGITransport for in-process testing without a running server.
Lexicon Definitions
The lexicons/ directory is a git submodule containing the authoritative science.alt.dataset.* lexicon schemas. Initialize it with:
git submodule update --init
The lexicons are for reference and CI validation. The Python source code uses hardcoded NSID constants and does not read the lexicon JSON files at runtime.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file atdata_app-0.4.0b1.tar.gz.
File metadata
- Download URL: atdata_app-0.4.0b1.tar.gz
- Upload date:
- Size: 189.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bbc8a764dc31c4a209a15efac8c6a7758fe2afab05c459a88de3267d00a59caf
|
|
| MD5 |
b4d6e4118033cb1c7e2ced3923b95548
|
|
| BLAKE2b-256 |
73d7ba64b5ea53f42bba9ef80006fecb3e2886b1e9d36f9f04181cde55d08bdb
|
Provenance
The following attestation bundles were made for atdata_app-0.4.0b1.tar.gz:
Publisher:
publish.yml on forecast-bio/atdata-app
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
atdata_app-0.4.0b1.tar.gz -
Subject digest:
bbc8a764dc31c4a209a15efac8c6a7758fe2afab05c459a88de3267d00a59caf - Sigstore transparency entry: 999090700
- Sigstore integration time:
-
Permalink:
forecast-bio/atdata-app@4aa15399c4e7f0c278201a16531ae38a9ad7be79 -
Branch / Tag:
refs/tags/v0.4.0b1 - Owner: https://github.com/forecast-bio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4aa15399c4e7f0c278201a16531ae38a9ad7be79 -
Trigger Event:
release
-
Statement type:
File details
Details for the file atdata_app-0.4.0b1-py3-none-any.whl.
File metadata
- Download URL: atdata_app-0.4.0b1-py3-none-any.whl
- Upload date:
- Size: 42.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a23665b53ab91eaca75a4beddbcf3d01c7ecc31247b09ce7da206d03f2f6ad4c
|
|
| MD5 |
9d6723694855c36aef669d61ffd01d21
|
|
| BLAKE2b-256 |
3705fc83903d453427ca80d413561508fb19d3aa56422e72e58cf59afa2a075c
|
Provenance
The following attestation bundles were made for atdata_app-0.4.0b1-py3-none-any.whl:
Publisher:
publish.yml on forecast-bio/atdata-app
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
atdata_app-0.4.0b1-py3-none-any.whl -
Subject digest:
a23665b53ab91eaca75a4beddbcf3d01c7ecc31247b09ce7da206d03f2f6ad4c - Sigstore transparency entry: 999090740
- Sigstore integration time:
-
Permalink:
forecast-bio/atdata-app@4aa15399c4e7f0c278201a16531ae38a9ad7be79 -
Branch / Tag:
refs/tags/v0.4.0b1 - Owner: https://github.com/forecast-bio
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4aa15399c4e7f0c278201a16531ae38a9ad7be79 -
Trigger Event:
release
-
Statement type: