Python API for interacting with the Sample API system (MS version)
Project description
Sample Metadata
The sample-metadata system is database that stores de-identified metadata.
There are three components to the sample-metadata system:
- System-versioned MariaDB database,
- Python web API to manage permissions, and store frequently used queries,
- An installable python library that wraps the Python web API (using OpenAPI generator)
Every resource in sample-metadata belongs to a project. All resources are access
controlled through membership of the google groups:
$dataset-sample-metadata-main-{read,write}
. Note that members of google-groups
are cached in a secret as group-membership identity checks are slow.
Structure
Sample IDs
In an effort to reduce our dependency on potentially mutable external sample IDs with inconsistent format, the sample-metadata server generates an internal sample id for every sample. Internally they're an incrementing integer, but they're transformed externally to have a prefix, and checksum - this allows durability when transcribing sample IDs to reduce mistypes, and allow to quickly check whether a sample ID is valid.
NB: The prefix and checksums are modified per environment (production, development, local) to avoid duplicates from these environments.
For example, let's consider the production environment which uses the prefix of CPG
and a checksum offset of 0:
A sample is given the internal ID
12345
, we calculate the Luhn checksum to be5
(with no offset applied). We can then concatenate the results, for the final sample ID to beCPG123455
.
Reporting sex
To avoid ambiguity in reporting of gender, sex and karyotype - the sample metadata system
stores these values separately on the participant
as:
reported_gender
(string, expectedmale
|female
| other values)reported_sex
(follows pedigree convention:unknown=0 | null
,male=1
,female=2
)inferred_karyotype
(string, eg:XX
|XY
| other karyotypes)
If you import a pedigree, the sex value is written to the reported_sex
attribute.
Local develompent of SM
The recommended way to develop the sample-metadata system is to run a local copy of SM.
There have been some reported issues of running a local SM environment on an M1 mac.
You can run MariaDB with a locally installed docker, or from within a docker container. You can configure the MariaDB connection with environment variables.
Creating the environment
Dependencies for the sample-metadata
API package are listed in setup.py
.
Additional dev requirements are listed in requirements-dev.txt
, and packages for
the sever-side code are listed in requirements.txt
.
To create the full dev environment, run:
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt
pip install --editable .
Default DB set-up
These are the default values for the SM database connection. Please alter them if you use any different values when setting up the database.
export SM_DEV_DB_USER=root
export SM_DEV_DB_PASSWORD= # empty password
export SM_DEV_DB_HOST=127.0.0.1
export SM_DEV_DB_PORT=3306 # default mariadb port
Create the database in MariaDB (by default, we call it sm_dev
):
If you use a different databse name also set the following
export SM_DEV_DB_NAME=sm_database_name
Sample-metadata stores all metadata in one database (previously: one database per project).
mysql -u root --execute 'CREATE DATABASE sm_dev'
Download the mariadb-java-client
and create the schema using liquibase:
pushd db/
wget https://repo1.maven.org/maven2/org/mariadb/jdbc/mariadb-java-client/3.0.3/mariadb-java-client-3.0.3.jar
liquibase \
--changeLogFile project.xml \
--url jdbc:mariadb://localhost/sm_dev \
--driver org.mariadb.jdbc.Driver \
--classpath mariadb-java-client-3.0.3.jar \
--username root \
update
popd
Using Maria DB docker image
Pull mariadb image
docker pull mariadb
Run a mariadb container that will server your database. -p 3307:3306
remaps the port to 3307 in case if you local MySQL is already using 3306
docker stop mysql-p3307 # stop and remove if the container already exists
docker rm mysql-p3307
# run with an empty root password
docker run -p 3307:3306 --name mysql-p3307 -e MYSQL_ALLOW_EMPTY_PASSWORD=true -d mariadb
mysql --host=127.0.0.1 --port=3307 -u root -e 'CREATE DATABASE sm_dev;'
mysql --host=127.0.0.1 --port=3307 -u root -e 'show databases;'
Go into the db/
subdirectory, download the mariadb-java-client
and create the schema using liquibase:
pushd db/
wget https://repo1.maven.org/maven2/org/mariadb/jdbc/mariadb-java-client/3.0.3/mariadb-java-client-3.0.3.jar
liquibase \
--changeLogFile project.xml \
--url jdbc:mariadb://127.0.0.1:3307/sm_dev \
--driver org.mariadb.jdbc.Driver \
--classpath mariadb-java-client-3.0.3.jar \
--username root \
update
popd
Finally, make sure you configure the server (making use of the environment variables) to point it to your local Maria DB server
export SM_DEV_DB_PORT=3307
Running the server
You'll want to set the following environment variables (permanently) in your local development environment.
# ensures the SWAGGER page (localhost:8000/docs) points to your local environment
export SM_ENVIRONMENT=LOCAL
# skips permission checks in your local environment
export SM_ALLOWALLACCESS=true
# start the server
python3 -m api.server
# OR
# uvicorn --port 8000 --host 0.0.0.0 api.server:app
In a different terminal, execute the following request to create a new project called 'dev'
curl -X 'PUT' \
'http://localhost:8000/api/v1/project/?name=dev&dataset=dev&gcp_id=dev&create_test_project=false' \
-H 'accept: application/json' \
-H "Authorization: Bearer $(gcloud auth print-identity-token)"
Quickstart: Generate and install the installable API
It's best to do this with an already running server:
python3 regenerate_api.py \
&& pip install .
Debugging the server in VSCode
VSCode allows you to debug python modules, we could debug the web API at api/server.py
by considering the following launch.json
:
{
"version": "0.2.0",
"configurations": [
{
"name": "API server",
"type": "python",
"request": "launch",
"module": "api.server"
}
]
}
We could now place breakpoints on the sample route (ie: api/routes/sample.py
), and debug requests as they come in.
Developing the UI
# Ensure you have started sm locally on your computer already, then in another tab open the UI.
# This will automatically proxy request to the server.
cd web
npm install
npm start
Unauthenticated access
You'll want to set the SM_LOCALONLY_DEFAULTUSER
environment variable along with ALLOWALLACCESS
to allow access to a local sample-metadata server without providing a bearer token. This will allow you to test the front-end components that access data. This happens automatically on the production instance through the Google identity-aware-proxy.
export SM_ALLOWALLACCESS=1
export SM_LOCALONLY_DEFAULTUSER=$(whoami)
OpenAPI and Swagger
The Web API uses apispec
with OpenAPI3 annotations on each route to describe interactions with the server. We can generate a swagger UI and an installable
python module based on these annotations.
Some handy links:
The web API exposes this schema in two ways:
- Swagger UI:
http://localhost:8000/docs
- You can use this to construct requests to the server
- Make sure you fill in the Bearer token (at the top right )
- OpenAPI schema:
http://localhost:8000/schema.json
- Returns a JSON with the full OpenAPI 3 compliant schema.
- You could put this into the Swagger editor to see the same "Swagger UI" that
/api/docs
exposes. - We generate the sample_metadata installable Python API based on this schema.
Generating the installable API
The installable API is automatically generated through the package.yml
GitHub action and uploaded to PyPI.
To generate the python api you'll need to install openapi generator v5.x.x
To install a specific version of the openapi-generator dow the following:
npm install @openapitools/openapi-generator-cli -g
openapi-generator-cli version-manager set 5.3.0
Then set your environment variable OPENAPI_COMMAND to the following. You can also add an alias to your ~/.bash_profile or equivalent for running in the terminal.
export OPENAPI_COMMAND="npx @openapitools/openapi-generator-cli"
alias openapi-generator="npx @openapitools/openapi-generator-cli"
You could generate the installable API and install it with pip by running:
# this will start the api.server, so make sure you have the dependencies installed,
python regenerate_api.py \
&& pip install .
Or you can build the docker file, and specify that
# SM_DOCKER is a known env variable to regenerate_api.py
export SM_DOCKER="cpg/sample-metadata-server:dev"
docker build --build-arg SM_ENVIRONMENT=local -t $SM_DOCKER -f deploy/api/Dockerfile .
python regenerate_apy.py
Deployment
The sample-metadata server
You'll want to complete the following steps:
- Ensure there is a database created for each project (with the database name being the project),
- Ensure there are secrets in
projects/sample_metadata/secrets/databases/versions/latest
, that's an array of objects with keysdbname, host, port, username, password
. - Ensure
google-cloud
was installed
export SM_ENVIRONMENT='PRODUCTION'
# OR, point to the dev instance with
export SM_ENVIRONMENT='DEVELOPMENT'
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file sample-metadata-ms-0.7.1.tar.gz
.
File metadata
- Download URL: sample-metadata-ms-0.7.1.tar.gz
- Upload date:
- Size: 108.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d7d4ab4949da8a5d04d757da6e605960fa7d78b8f34a19dcdb6b756a3d6f59e |
|
MD5 | d8a7b238c3a36544b2b388ce460e0598 |
|
BLAKE2b-256 | 9aa411cef6f2476a5f239b0182874c0b91c254415a2af3927812f2efe60f8ea6 |