Skip to main content

Python client for refget

Project description

Refget

Run pytests

User-facing documentation is hosted at refgenie.org/refget.

This repository includes:

  1. /refget: The refget Python package, which provides a Python interface to both remote and local use of refget standards. It has clients and functions for both refget sequences and refget sequence collections (seqcol).
  2. /seqcolapi: Sequence collections API software, a FastAPI wrapper built on top of the refget package. It provides a bare-bones Sequence Collections API service.
  3. /deployment: Server configurations for demo instances and public deployed instances. There are also github workflows (in .github/workflows) that deploy the demo server instance from this repository.
  4. /test_fasta and /test_api: Dummy data and a compliance test, to test external implementations of the Refget Sequence Collections API.
  5. /frontend: a React seqcolapi front-end.

Deploy to AWS ECS

To deploy the public demo instance, you can either:

  1. Create a GitHub release - This triggers the deploy_release_software.yml workflow, which builds and pushes the Docker image to DockerHub. After that completes, it automatically triggers deploy_primary.yml to deploy to AWS ECS.

  2. Manual dispatch - You can manually trigger either workflow from the GitHub Actions tab.

This builds seqcolapi, pushes to DockerHub, and deploys to ECS.

Testing

Unit tests

pytest

Integration tests (requires Docker)

Integration tests run against an ephemeral PostgreSQL database in Docker:

./scripts/test-integration.sh

This starts the test database, runs tests, and cleans up automatically.

Development and deployment: Backend

Easy-peasy way

In a moment I'll show you how to do these steps individually, but if you're in a hurry, the easy way get a development API running for testing is to just use my very simple shell script like this (no data persistence, just loads demo data):

bash deployment/demo_up.sh

This will:

  • populate env vars
  • launch postgres container with docker
  • run the refget service with uvicorn
  • load up the demo data
  • block the terminal until you press Ctrl+C, which will shut down all services.

Step-by-step process

Alternatively, if you want to run each step separately to see what's really going on, start here.

Setting up a database connection

First configure a database connection through environment variables. Choose one of these:

source deployment/local_demo/local_demo.env # local demo (see below to create the database using docker)
source deployment/seqcolapi.databio.org/production.env # connect to production database

If you're using the local_demo, then use docker to launch a local postgres database service like this:

docker run --rm --name refget-postgres -p 127.0.0.1:5432:5432 \
  -e POSTGRES_PASSWORD \
  -e POSTGRES_USER \
  -e POSTGRES_DB \
  -e POSTGRES_HOST \
  postgres:17.0

If you need to load test data into your server, then you have to install gtars (with pip install gtars), a Python package for computing GA4GH digests. You can then load test data like this:

PYTHONPATH=. python data_loaders/load_demo_seqcols.py

or:

refget add-fasta -p test_fasta/test_fasta_metadata.csv -r test_fasta

Running the seqcolapi API backend

Run the demo seqcolapi service like this:

uvicorn seqcolapi.main:app --reload --port 8100

Running with docker

To build the docker file, first build the image from the root of this repository:

docker build -f deployment/dockerhub/Dockerfile -t databio/seqcolapi seqcolapi

To run in container:

source deployment/seqcolapi.databio.org/production.env
docker run --rm -p 8000:80 --name seqcolapi \
  --env "POSTGRES_USER" \
  --env "POSTGRES_DB" \
  --env "POSTGRES_PASSWORD" \
  --env "POSTGRES_HOST" \
  databio/seqcolapi

Deploying container to dockerhub

Use the github action in this repo which deploys on release, or through manual dispatch.

Running the frontend

Once you have a backend running, you can run a frontend to interact with it

Local client with local server

cd frontend
npm i
VITE_API_BASE="http://localhost:8100" npm run dev

Local client with production server

cd frontend
npm i
VITE_API_BASE="https://seqcolapi.databio.org" npm run dev

Development with local WASM

The /digest feature uses @databio/gtars for WASM-based FASTA processing. To use a local gtars-wasm build instead of the npm package:

LOCAL_GTARS=../../gtars/gtars-wasm/pkg npm run dev

The LOCAL_GTARS env var should point to the pkg/ directory of a built gtars-wasm package (run wasm-pack build --target web in gtars-wasm to build it).

gtars WASM API Reference

The streaming API handles files of any size:

import * as gtars from '@databio/gtars';
await gtars.default();  // Initialize WASM

// Streaming API (for large files)
const handle = gtars.fastaHasherNew();
gtars.fastaHasherUpdate(handle, chunk);  // Feed Uint8Array chunks
const result = gtars.fastaHasherFinish(handle);  // Get SeqColResult

// Batch API (for small files)
const result = gtars.digestSeqcol(fastaBytes);

Result object:

interface SeqColResult {
  digest: string;           // Collection digest (SHA512t24u)
  names_digest: string;
  sequences_digest: string;
  lengths_digest: string;
  n_sequences: number;
  sequences: Array<{
    name: string;
    length: number;
    alphabet: string;       // dna2bit, dna3bit, etc.
    sha512t24u: string;
    md5: string;
    description?: string;
  }>;
}

Deploying

  1. Ensure the refget package master branch is as you want it.
  2. Deploy the updated secqolapi app to dockerhub (using manual dispatch, or deploy on github release).
  3. Finally, deploy the instance with manual dispatch using the included GitHub action.

Developer notes

Models

The objects and attributes are represented as SQLModel objects in refget/models.py. To add a new attribute:

  1. create a new model. This will create a table for that model, etc.
  2. change the function that creates the objects, to populate the new attribute.

Example of loading reference fasta datasets:

refget add-fasta -p ref_fasta.csv -r $BRICKYARD/datasets_downloaded/pangenome_fasta/reference_fasta

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refget-0.10.1.tar.gz (64.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

refget-0.10.1-py3-none-any.whl (71.2 kB view details)

Uploaded Python 3

File details

Details for the file refget-0.10.1.tar.gz.

File metadata

  • Download URL: refget-0.10.1.tar.gz
  • Upload date:
  • Size: 64.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for refget-0.10.1.tar.gz
Algorithm Hash digest
SHA256 8d79c96a694b7cee267963144b819893721146ca9d893e2f8f0a5e250175edb4
MD5 a660cbadeb34de22c0cf40db2ab41937
BLAKE2b-256 e48c2c75dfa0af20606233190476f336340d1cd963ff7ab40bb7a8caab0d7f7a

See more details on using hashes here.

Provenance

The following attestation bundles were made for refget-0.10.1.tar.gz:

Publisher: python-publish.yml on refgenie/refget

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file refget-0.10.1-py3-none-any.whl.

File metadata

  • Download URL: refget-0.10.1-py3-none-any.whl
  • Upload date:
  • Size: 71.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for refget-0.10.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d1571c98d03cad3a367a9b03457263535fd2b8025f4314ff35ccfd42aa748bdd
MD5 38f4b89e9c91d217c961da4259693134
BLAKE2b-256 25ba47596458a9f45281816bdb2543323c5b07340deb23fadb491454cdb67233

See more details on using hashes here.

Provenance

The following attestation bundles were made for refget-0.10.1-py3-none-any.whl:

Publisher: python-publish.yml on refgenie/refget

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page