Skip to main content

VICC normalization routine for variations

Project description

Variation Normalization

Services and guidelines for normalizing variation terms

Backend Services

Variation Normalization relies on some local data caches which you will need to set up. It uses pipenv to manage its environment, which you will also need to install.

Installation

Variation Normalization relies on seqrepo, which you must download yourself.

From the root directory:

pipenv shell
pipenv lock
pipenv sync
cd variation
pip install seqrepo
mkdir -p data/seqrepo
seqrepo -r data/seqrepo pull -i 2021-01-29
sudo chmod -R u+w data/seqrepo
cd data/seqrepo
seqrepo_date_dir=$(ls -d */)
sudo mv $seqrepo_date_dir latest

Variation Normalizer also uses uta.

The following commands will likely need modification appropriate for the installation environment.

  1. Install PostgreSQL

  2. Create user and database.

    $ createuser -U postgres uta_admin
    $ createuser -U postgres anonymous
    $ createdb -U postgres -O uta_admin uta
    
  3. To install locally, from the variation/data directory:

export UTA_VERSION=uta_20210129.pgd.gz
curl -O http://dl.biocommons.org/uta/$UTA_VERSION
gzip -cdq ${UTA_VERSION} | grep -v "^REFRESH MATERIALIZED VIEW" | psql -h localhost -U uta_admin --echo-errors --single-transaction -v ON_ERROR_STOP=1 -d uta -p 5433

To connect to the UTA database, you can use the default url (postgresql://uta_admin@localhost:5433/uta/uta_20210129). If you use the default url, you must either set the password using environment variable UTA_PASSWORD or setting the parameter db_pwd in the UTA class.

If you do not wish to use the default, you must set the environment variable UTA_DB_URL which has the format of driver://user:pass@host/database/schema.

Data

Variation Normalization uses Ensembl BioMart to retrieve variation/data/transcript_mappings.tsv. We currently use Human Genes (GRCh38.p13) for the dataset and the following attributes we use are: Gene stable ID, Gene stable ID version, Transcript stable ID, Transcript stable ID version, Protein stable ID, Protein stable ID version, RefSeq match transcript (MANE Select), Gene name.

image

Setting up Gene Normalizer

Variation Normalization normalize endpoint relies on data from Gene Normalization. To install:

pip install gene-normalizer

To setup, follow the instructions from the Gene Normalization README.

You must have the Gene Normalizer DynamoDB running for the variation normalize endpoint to work.

Init coding style tests

Code style is managed by flake8 and checked prior to commit.

We use pre-commit to run conformance tests.

This ensures:

  • Check code style
  • Check for added large files
  • Detect AWS Credentials
  • Detect Private Key

Before first commit run:

pre-commit install

Testing

From the root directory of the repository:

pytest tests/

Starting the Variation Normalization Service

gene-normalizers dynamodb must be running and run the following:

docker-compose -f docker-compose.yml up

From the root directory of the repository:

uvicorn variation.main:app --reload

Next, view the OpenAPI docs on your local machine: http://127.0.0.1:8000/variation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

variation-normalizer-0.2.5.tar.gz (84.1 kB view details)

Uploaded Source

Built Distribution

variation_normalizer-0.2.5-py3-none-any.whl (189.9 kB view details)

Uploaded Python 3

File details

Details for the file variation-normalizer-0.2.5.tar.gz.

File metadata

  • Download URL: variation-normalizer-0.2.5.tar.gz
  • Upload date:
  • Size: 84.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.9.5

File hashes

Hashes for variation-normalizer-0.2.5.tar.gz
Algorithm Hash digest
SHA256 3e0cca5bec49358f3220a1335a8caf4550d4a707d88352c74462bb17da741655
MD5 b0f1f46260ea00f69be1cfb89e450c42
BLAKE2b-256 08d39203a0a5dec346a320b175a61cbd50267723b108f9da90420f15070b9dbf

See more details on using hashes here.

File details

Details for the file variation_normalizer-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: variation_normalizer-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 189.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.9.5

File hashes

Hashes for variation_normalizer-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 177ebcef8306e6ba79ebefc3b90358622a87da90b4eea6f547bbf8b146384b88
MD5 de449a266e3b02394cc600f277853fd2
BLAKE2b-256 7d781362bb982e75563e84a32176fb9ff336b8bcdb4aadd332086b47d9f0579e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page