GA4GH refget - reference sequence and sequence collection tools
Project description
Refget
User-facing documentation is hosted at refgenie.org/refget.
This repository includes:
/refget: TherefgetPython package, which provides a Python interface to both remote and local use of refget standards. It has clients and functions for both refget sequences and refget sequence collections (seqcol)./seqcolapi: Sequence collections API software, a FastAPI wrapper built on top of therefgetpackage. It provides a bare-bones Sequence Collections API service./deployment: Server configurations for demo instances and public deployed instances. There are also github workflows (in.github/workflows) that deploy the demo server instance from this repository./test_fastaand/test_api: Dummy data and a compliance test, to test external implementations of the Refget Sequence Collections API./frontend: a React seqcolapi front-end.
Deploy to AWS ECS
To deploy the public demo instance, you can either:
-
Create a GitHub release - This triggers the
deploy_release_software.ymlworkflow, which builds and pushes the Docker image to DockerHub. After that completes, it automatically triggersdeploy_primary.ymlto deploy to AWS ECS. -
Manual dispatch - You can manually trigger either workflow from the GitHub Actions tab.
This builds seqcolapi, pushes to DockerHub, and deploys to ECS.
Testing
Unit tests
pytest
Integration tests (requires Docker)
Integration tests run against an ephemeral PostgreSQL database in Docker:
./scripts/test-integration.sh
This starts the test database, runs tests, and cleans up automatically.
Development and deployment: Backend
Store-backed (no database)
The store-backed seqcolapi uses a RefgetStore (local files) instead of PostgreSQL. This is the simplest way to run the API:
Quick start
bash deployment/store_demo_up.sh
This will:
- Build a local RefgetStore from test FASTA files
- Run the store-backed seqcolapi with uvicorn
- Block the terminal until you press Ctrl+C, which cleans up
No Docker or database required.
Step-by-step
- Build a store from FASTA files:
python data_loaders/demo_build_store.py test_fasta /tmp/refget_demo_store
- Start the store-backed API:
REFGET_STORE_PATH=/tmp/refget_demo_store uvicorn seqcolapi.main:store_app --reload --port 8100
Remote store
To run against a remote (S3) store:
REFGET_STORE_URL=https://example.com/store uvicorn seqcolapi.main:store_app --port 8100
DB-backed (PostgreSQL)
If you need a database-backed instance (e.g., for mutable data, advanced queries), use the DB-backed workflow. In a moment I'll show you how to do these steps individually, but if you're in a hurry, the easy way to get a development API running for testing is to just use my very simple shell script like this (no data persistence, just loads demo data):
bash deployment/demo_up.sh
This will:
- populate env vars
- launch postgres container with docker
- run the refget service with uvicorn
- load up the demo data
- block the terminal until you press Ctrl+C, which will shut down all services.
Step-by-step process (DB-backed)
Alternatively, if you want to run each step separately to see what's really going on, start here.
Setting up a database connection
First configure a database connection through environment variables. Choose one of these:
source deployment/local_demo/local_demo.env # local demo (see below to create the database using docker)
source deployment/seqcolapi.databio.org/production.env # connect to production database
If you're using the local_demo, then use docker to launch a local postgres database service like this:
docker run --rm --name refget-postgres -p 127.0.0.1:5432:5432 \
-e POSTGRES_PASSWORD \
-e POSTGRES_USER \
-e POSTGRES_DB \
-e POSTGRES_HOST \
postgres:17.0
If you need to load test data into your server, then you have to install gtars (with pip install gtars), a Python package for computing GA4GH digests. You can then load test data like this:
PYTHONPATH=. python data_loaders/load_demo_seqcols.py
or:
refget add-fasta -p test_fasta/test_fasta_metadata.csv -r test_fasta
Running the seqcolapi API backend
Run the demo seqcolapi service like this:
uvicorn seqcolapi.main:app --reload --port 8100
Running with docker
To build the docker file, first build the image from the root of this repository:
docker build -f deployment/dockerhub/Dockerfile -t databio/seqcolapi seqcolapi
To run in container:
source deployment/seqcolapi.databio.org/production.env
docker run --rm -p 8000:80 --name seqcolapi \
--env "POSTGRES_USER" \
--env "POSTGRES_DB" \
--env "POSTGRES_PASSWORD" \
--env "POSTGRES_HOST" \
databio/seqcolapi
Deploying container to dockerhub
Use the github action in this repo which deploys on release, or through manual dispatch.
Running the frontend
Once you have a backend running, you can run a frontend to interact with it
Local client with local server
cd frontend
npm i
VITE_API_BASE="http://localhost:8100" npm run dev
Local client with production server
cd frontend
npm i
VITE_API_BASE="https://seqcolapi.databio.org" npm run dev
Development with local WASM
The /digest feature uses @databio/gtars for WASM-based FASTA processing. To use a local gtars-wasm build instead of the npm package:
LOCAL_GTARS=../../gtars/gtars-wasm/pkg npm run dev
The LOCAL_GTARS env var should point to the pkg/ directory of a built gtars-wasm package (run wasm-pack build --target web in gtars-wasm to build it).
gtars WASM API Reference
The streaming API handles files of any size:
import * as gtars from '@databio/gtars';
await gtars.default(); // Initialize WASM
// Streaming API (for large files)
const handle = gtars.fastaHasherNew();
gtars.fastaHasherUpdate(handle, chunk); // Feed Uint8Array chunks
const result = gtars.fastaHasherFinish(handle); // Get SeqColResult
// Batch API (for small files)
const result = gtars.digestSeqcol(fastaBytes);
Result object:
interface SeqColResult {
digest: string; // Collection digest (SHA512t24u)
names_digest: string;
sequences_digest: string;
lengths_digest: string;
n_sequences: number;
sequences: Array<{
name: string;
length: number;
alphabet: string; // dna2bit, dna3bit, etc.
sha512t24u: string;
md5: string;
description?: string;
}>;
}
Deploying
- Ensure the refget package master branch is as you want it.
- Deploy the updated secqolapi app to dockerhub (using manual dispatch, or deploy on github release).
- Finally, deploy the instance with manual dispatch using the included GitHub action.
Developer notes
Models
The objects and attributes are represented as SQLModel objects in refget/models.py. To add a new attribute:
- create a new model. This will create a table for that model, etc.
- change the function that creates the objects, to populate the new attribute.
Example of loading reference fasta datasets:
refget add-fasta -p ref_fasta.csv -r $BRICKYARD/datasets_downloaded/pangenome_fasta/reference_fasta
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file refget-0.11.0.tar.gz.
File metadata
- Download URL: refget-0.11.0.tar.gz
- Upload date:
- Size: 381.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bc908bf8f838a2173ba69259df14ddb99862b359e57365d715a995ff38cf648
|
|
| MD5 |
7d2c056dfba9191bfd400f891d824d4a
|
|
| BLAKE2b-256 |
4f224aed48249148fc88e208805059f3f5a941546cebb1e69e849d917ad0d614
|
Provenance
The following attestation bundles were made for refget-0.11.0.tar.gz:
Publisher:
python-publish.yml on refgenie/refget
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
refget-0.11.0.tar.gz -
Subject digest:
0bc908bf8f838a2173ba69259df14ddb99862b359e57365d715a995ff38cf648 - Sigstore transparency entry: 1129343219
- Sigstore integration time:
-
Permalink:
refgenie/refget@9c1f3015cd6fa994e349b6f0208e900c9a3835fc -
Branch / Tag:
refs/tags/v0.11.0 - Owner: https://github.com/refgenie
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@9c1f3015cd6fa994e349b6f0208e900c9a3835fc -
Trigger Event:
release
-
Statement type:
File details
Details for the file refget-0.11.0-py3-none-any.whl.
File metadata
- Download URL: refget-0.11.0-py3-none-any.whl
- Upload date:
- Size: 84.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fca5cba1f929d71a978053a53cdac6f9469d4840887300f79e6eb6ae3d7d1f4a
|
|
| MD5 |
d1bdd4bf9ab658c9dbf94c09b6605f92
|
|
| BLAKE2b-256 |
c7f5a1b7869e055d15425e03f917654f849f78580fe9950f14481d7a99054cc3
|
Provenance
The following attestation bundles were made for refget-0.11.0-py3-none-any.whl:
Publisher:
python-publish.yml on refgenie/refget
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
refget-0.11.0-py3-none-any.whl -
Subject digest:
fca5cba1f929d71a978053a53cdac6f9469d4840887300f79e6eb6ae3d7d1f4a - Sigstore transparency entry: 1129343305
- Sigstore integration time:
-
Permalink:
refgenie/refget@9c1f3015cd6fa994e349b6f0208e900c9a3835fc -
Branch / Tag:
refs/tags/v0.11.0 - Owner: https://github.com/refgenie
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@9c1f3015cd6fa994e349b6f0208e900c9a3835fc -
Trigger Event:
release
-
Statement type: