Skip to main content

A search interface for cancer variant interpretations assembled by aggregating and harmonizing across multiple cancer variant interpretation knowledgebases.

Project description

Documentation Status Build Status Coverage Status

metakb

The intent of the project is to leverage the collective knowledge of the disparate existing resources of the VICC to improve the comprehensiveness of clinical interpretation of genomic variation. An ongoing goal will be to provide and improve upon standards and guidelines by which other groups with clinical interpretation data may make it accessible and visible to the public. We have released a preprint discussing our initial harmonization effort and observed disparities in the structure and content of variant interpretations.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

  • A newer version of Python 3, preferably 3.8 or greater. To confirm on your system, run:
python3 --version
  • Pipenv, for package management.
pip3 install --user pipenv

Installing

Once Pipenv is installed, clone the repo and install the package requirements into a Pipenv environment:

git clone https://github.com/cancervariants/metakb
cd metakb
pipenv lock
pipenv sync

If you intend to provide development support, install the development dependencies:

pipenv lock --dev
pipenv sync

Setting up Neo4j

The MetaKB uses Neo4j for its database backend. To run a local MetaKB instance, you'll need to run a Neo4j database instance as well. The easiest way to do this is from Neo4j Desktop.

First, follow the desktop setup instructions to download, install, and open Neo4j Desktop for the first time.

Once you have opened Neo4j desktop, use the "New" button in the upper-left region of the window to create a new project. Within that project, click the "Add" button in the upper-right region of the window and select "Local DBMS". The name of the DBMS doesn't matter, but the password will be used later to connect the database to MetaKB (we have been using "admin" by default). Click "Create". Then, click the row within the project screen corresponding to your newly-created DBMS, and click the green "Start" button to start the database service.

The graph will initially be empty, but once you have successfully loaded data, Neo4j Desktop provides an interface for exploring and visualizing relationships within the graph. To access it, click the blue "Open" button. The prompt at the top of this window processes Cypher queries; to start, try MATCH (n:Statement {id:"civic.eid:5818"}) RETURN n. Buttons on the left-hand edge of the results pane let you select graph, tabular, or textual output.

Setting up normalizers

The MetaKB calls a number of normalizer libraries to transform resource data and resolve incoming search queries. These will be installed as part of the package requirements, but require additional setup.

First, download and install Amazon's DynamoDB. Once installed, in a separate terminal instance, navigate to its source directory and run the following to start the database instance:

java -Djava.library.path=./DynamoDBLocal_lib -jar DynamoDBLocal.jar -sharedDb

Next, navigate to the site-packages directory of your virtual environment. Assuming Pipenv is installed to your user directory, this should be something like:

cd ~/.local/share/virtualenvs/metakb-<various characters>/python3.7/site-packages/  # replace <various characters>

Next, initialize the Variation Normalizer by following the instructions in the README.

The MetaKB can acquire all other needed normalizer data, except for that of OMIM, which must be manually placed:

cd disease/  # starting from the site-packages dir of your virtual environment's Python instance
mkdir -p data/omim
cp ~/YOUR/PATH/TO/mimTitles.txt data/omim/omim_<date>.tsv  # replace <date> with date of data acquisition formatted as YYYYMMDD

Loading data

Once Neo4j and DynamoDB instances are both active, and necessary normalizer data has been placed, run the MetaKB CLI with the --initialize_normalizers flag to acquire all other necessary normalizer source data, and execute harvest, transform, and load operations into the graph datastore.

In the MetaKB project root, run the following:

pipenv shell
python3 -m metakb.cli --db_url=bolt://localhost:7687 --db_username=neo4j --db_password=<neo4j-password-here> --load_normalizers_db

Starting the server

Once data has been loaded successfully, use the following to start service on localhost port 8000:

uvicorn metakb.main:app --reload

Navigate to http://localhost:8000/api/v2 in your browser to enter queries.

Running tests

Unit tests

Explain how to run the automated tests for this system

python3 -m pytest

And coding style tests

Code style is managed by flake8 and checked prior to commit.

see .flake8

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Committing

We use pre-commit to run conformance tests.

This ensures:

  • Check code style
  • Check for added large files
  • Detect AWS Credentials
  • Detect Private Key

Before first commit run:

pre-commit install

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

License

This project is licensed under the MIT License - see the LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metakb-1.1.0a6.tar.gz (53.2 kB view details)

Uploaded Source

Built Distribution

metakb-1.1.0a6-py3-none-any.whl (56.8 kB view details)

Uploaded Python 3

File details

Details for the file metakb-1.1.0a6.tar.gz.

File metadata

  • Download URL: metakb-1.1.0a6.tar.gz
  • Upload date:
  • Size: 53.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for metakb-1.1.0a6.tar.gz
Algorithm Hash digest
SHA256 e966e89a4ba161431594723c5df5e0f0f48571a026e5a6edcd434318b8d24356
MD5 8ba93712224b23198a92f52ee6ef9d77
BLAKE2b-256 ed5c592695fdfc5e25af0d0d83921daf41d82091bb1e422024a7fa38c24e61b4

See more details on using hashes here.

File details

Details for the file metakb-1.1.0a6-py3-none-any.whl.

File metadata

  • Download URL: metakb-1.1.0a6-py3-none-any.whl
  • Upload date:
  • Size: 56.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for metakb-1.1.0a6-py3-none-any.whl
Algorithm Hash digest
SHA256 0a1db8ccae25a61d1af1278280f2c56c5b684c4b1dfb528b85742a35bbbe264c
MD5 bd4c6a8b0ba88e8acf5b30ec3d5514d4
BLAKE2b-256 7e4f1663a4bbef4e9647c8d76337b2e1c0bda2650feb9e20c0a0b613337e1e16

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page