ATProto AppView for science.alt.dataset

Project description

atdata-app

An ATProto AppView for the science.alt.dataset lexicon namespace. It indexes dataset metadata published across the AT Protocol network and serves it through XRPC endpoints — enabling discovery, search, and resolution of datasets, schemas, labels, and lenses.

Overview

In the AT Protocol architecture, an AppView is a service that subscribes to the network firehose, indexes records it cares about, and exposes query endpoints for clients. atdata-app does this for scientific and ML dataset metadata:

Schemas define the structure of datasets (JSON Schema, Arrow schema, etc.)
Dataset entries describe a dataset — its name, storage location, schema, tags, license, and size
Labels are human-readable version tags pointing to a specific dataset entry (like git tags)
Lenses are bidirectional schema transforms with getter/putter code for migrating data between schema versions

ATProto Network
    │
    ├── Jetstream (WebSocket firehose) ──► Real-time ingestion
    │                                         │
    └── BGS Relay (HTTP backfill) ──────► Historical backfill
                                              │
                                              ▼
                                         PostgreSQL
                                              │
                                              ▼
                                     XRPC Query Endpoints ──► Clients

Requirements

Python 3.12+
PostgreSQL 14+
uv package manager

Quickstart

# Install dependencies
uv sync --dev

# Initialize the lexicon submodule
git submodule update --init

# Set up PostgreSQL (schema auto-applies on startup)
createdb atdata_app

# Start the server
uv run uvicorn atdata_app.main:app --reload

The server starts with dev-mode defaults: http://localhost:8000, DID did:web:localhost%3A8000. On startup it connects to Jetstream and begins indexing science.alt.dataset.* records, and runs a one-shot backfill of historical records from the BGS relay.

Configuration

All settings are environment variables prefixed with ATDATA_, managed by pydantic-settings.

Variable	Default	Description
`ATDATA_HOSTNAME`	`localhost`	Public hostname, used to derive `did:web` identity
`ATDATA_PORT`	`8000`	Server port (included in DID in dev mode)
`ATDATA_DEV_MODE`	`true`	Dev mode uses `http://` and includes port in DID; production uses `https://`
`ATDATA_DATABASE_URL`	`postgresql://localhost:5432/atdata_app`	PostgreSQL connection string
`ATDATA_JETSTREAM_URL`	`wss://jetstream2.us-east.bsky.network/subscribe`	Jetstream WebSocket endpoint
`ATDATA_JETSTREAM_COLLECTIONS`	`science.alt.dataset.*`	Collections to subscribe to
`ATDATA_RELAY_HOST`	`https://bsky.network`	BGS relay for backfill DID discovery

Identity

The service derives its did:web identity from the hostname and port:

Dev mode: did:web:localhost%3A8000 with endpoint http://localhost:8000
Production: did:web:datasets.example.com with endpoint https://datasets.example.com

The DID document is served at GET /.well-known/did.json and advertises the service as an AtprotoAppView.

API Reference

See docs/api-reference.md for the full XRPC endpoint reference (queries, procedures, and other routes).

Data Model

See docs/data-model.md for the database schema (schemas, entries, labels, lenses).

Docker Deployment

The app ships with a multi-stage Dockerfile using uv for fast dependency installation.

Build and run locally

docker build -t atdata-app .

docker run -p 8000:8000 \
  -e ATDATA_DATABASE_URL=postgresql://user:pass@host:5432/atdata_app \
  -e ATDATA_HOSTNAME=localhost \
  -e ATDATA_DEV_MODE=true \
  atdata-app

Deploy on Railway

The repo includes a railway.toml that configures the Dockerfile builder, health checks at /health, and a restart-on-failure policy.

Connect the repo to a Railway project
Add a PostgreSQL service and link it
Set the required environment variables:

Variable	Value
`ATDATA_DATABASE_URL`	Provided by Railway's PostgreSQL plugin (`${{Postgres.DATABASE_URL}}`)
`ATDATA_HOSTNAME`	Your Railway public domain (e.g. `atdata-app-production.up.railway.app`)
`ATDATA_DEV_MODE`	`false`
`ATDATA_PORT`	Omit — Railway sets `PORT` automatically and the container respects it

Optional variables for ingestion tuning:

Variable	Default	Description
`ATDATA_JETSTREAM_URL`	`wss://jetstream2.us-east.bsky.network/subscribe`	Jetstream endpoint
`ATDATA_RELAY_HOST`	`https://bsky.network`	BGS relay for backfill

Railway will auto-deploy on push, build the Docker image, and start the container.

Development

# Run tests (no database required)
uv run pytest

# Run a single test
uv run pytest tests/test_models.py::test_parse_at_uri -v

# Run with coverage
uv run pytest --cov=atdata_app

# Lint
uv run ruff check src/ tests/

Tests mock all external dependencies (database, HTTP, identity resolution) using unittest.mock.AsyncMock. HTTP endpoint tests use httpx ASGITransport for in-process testing without a running server.

Lexicon Definitions

The lexicons/ directory is a git submodule containing the authoritative science.alt.dataset.* lexicon schemas. Initialize it with:

git submodule update --init

The lexicons are for reference and CI validation. The Python source code uses hardcoded NSID constants and does not read the lexicon JSON files at runtime.

License

MIT

Project details

Release history Release notifications | RSS feed

This version

0.4.0b1 pre-release

Feb 26, 2026

0.3.0b1 pre-release

Feb 22, 2026

0.2.2b1 pre-release

Feb 18, 2026

0.2.1b1 pre-release

Feb 18, 2026

0.2.0b1 pre-release

Feb 18, 2026

0.1.0b1 pre-release

Feb 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atdata_app-0.4.0b1.tar.gz (189.3 kB view details)

Uploaded Feb 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

atdata_app-0.4.0b1-py3-none-any.whl (42.2 kB view details)

Uploaded Feb 26, 2026 Python 3

File details

Details for the file atdata_app-0.4.0b1.tar.gz.

File metadata

Download URL: atdata_app-0.4.0b1.tar.gz
Upload date: Feb 26, 2026
Size: 189.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for atdata_app-0.4.0b1.tar.gz
Algorithm	Hash digest
SHA256	`bbc8a764dc31c4a209a15efac8c6a7758fe2afab05c459a88de3267d00a59caf`
MD5	`b4d6e4118033cb1c7e2ced3923b95548`
BLAKE2b-256	`73d7ba64b5ea53f42bba9ef80006fecb3e2886b1e9d36f9f04181cde55d08bdb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for atdata_app-0.4.0b1.tar.gz:

Publisher: publish.yml on forecast-bio/atdata-app

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: atdata_app-0.4.0b1.tar.gz
- Subject digest: bbc8a764dc31c4a209a15efac8c6a7758fe2afab05c459a88de3267d00a59caf
- Sigstore transparency entry: 999090700
- Sigstore integration time: Feb 26, 2026
Source repository:
- Permalink: forecast-bio/atdata-app@4aa15399c4e7f0c278201a16531ae38a9ad7be79
- Branch / Tag: refs/tags/v0.4.0b1
- Owner: https://github.com/forecast-bio
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4aa15399c4e7f0c278201a16531ae38a9ad7be79
- Trigger Event: release

File details

Details for the file atdata_app-0.4.0b1-py3-none-any.whl.

File metadata

Download URL: atdata_app-0.4.0b1-py3-none-any.whl
Upload date: Feb 26, 2026
Size: 42.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for atdata_app-0.4.0b1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a23665b53ab91eaca75a4beddbcf3d01c7ecc31247b09ce7da206d03f2f6ad4c`
MD5	`9d6723694855c36aef669d61ffd01d21`
BLAKE2b-256	`3705fc83903d453427ca80d413561508fb19d3aa56422e72e58cf59afa2a075c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for atdata_app-0.4.0b1-py3-none-any.whl:

Publisher: publish.yml on forecast-bio/atdata-app

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: atdata_app-0.4.0b1-py3-none-any.whl
- Subject digest: a23665b53ab91eaca75a4beddbcf3d01c7ecc31247b09ce7da206d03f2f6ad4c
- Sigstore transparency entry: 999090740
- Sigstore integration time: Feb 26, 2026
Source repository:
- Permalink: forecast-bio/atdata-app@4aa15399c4e7f0c278201a16531ae38a9ad7be79
- Branch / Tag: refs/tags/v0.4.0b1
- Owner: https://github.com/forecast-bio
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4aa15399c4e7f0c278201a16531ae38a9ad7be79
- Trigger Event: release

atdata-app 0.4.0b1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

atdata-app

Overview

Requirements

Quickstart

Configuration

Identity

API Reference

Data Model

Docker Deployment

Build and run locally

Deploy on Railway

Development

Lexicon Definitions

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance