Skip to main content

Configure your data for visualization with dms-viz.github.io

Project description

configure_dms_viz

License Code style: black Ruff


Overview

configure_dms_viz is a python command line utility that formats your data for the visualization tool dms-viz. dms-viz is a tool that helps you take quantitative data associated with mutations to a protein and analyze that data using intuitive visual summaries and an interactive 3D protein structure. Visualizations created with dms-viz are flexible, customizable, and shareable.

For more information on getting started with dms-viz, check out the documentation.

Prerequisites

To use configure-dms-viz, you must ensure that you have the correct version of Python (>= 3.9) installed on your operating system. You can check this by running:

python --version

The version number displayed should >= 3.9.x.

Installation

configure-dms-viz is distributed on PyPI, allowing you to install configure-dms-viz using pip. To install the latest version of configure-dms-viz, run the following command:

pip install configure-dms-viz

configure-dms-viz should now be installed. You can double-check that the installation worked by running the following command:

configure-dms-viz --help

You should see a help message printed to the terminal.

Basic Usage

configure_dms_viz takes input data consisting of a quantitative metric associated with mutations to a protein sequence and returns a .json specification file that is uploaded to dms-viz to create an interactive visualization. Below is a simple tutorial on configure-dms-viz; however, for a detailed guide to the configure-dms-viz API, check out the documentation.

configure-dms-viz has two commands, format and join. To format a single dataset for dms-viz, you execute the configure-dms-viz format command with the required and optional arguments as needed:

configure-dms-viz format \
    --name <experiment_name> \
    --input <input_csv> \
    --metric <metric_column> \
    --structure <pdb_structure> \
    --output <output_json> \
    [optional_arguments]

The information that is required to make a visualization file for dms-viz is as follows:

  1. --name: The name of your dataset as you'd like it to appear in the visualization.
  2. --input: The file path to your input data.
  3. --metric: The name of the column that contains the metric you want to visualize.
  4. --structure: The protein structure that you want to use as a 3D model.
  5. --output: The file path of the output .json file.

The remaining arguments are optional and configure the protein, appearance, and data included in your final visualization.

Now, let's use configure-dms-viz with a minimal example. The example data is included in this GitHub repository under tests/. If you want to follow along, clone the repository and run the following command from the root of the directory.

Input

configure-dms-viz format \
   --name "REGN mAb Cocktail" \
   --input tests/SARS2-RBD-REGN-DMS/input/REGN_escape.csv \
   --metric "mut_escape" \
   --metric-name "Escape" \
   --sitemap tests/SARS2-RBD-REGN-DMS/sitemap/sitemap.csv \
   --structure "6XDG" \
   --included-chains "E" \
   --condition "condition" \
   --condition-name "Antibody" \
   --output ./REGN_escape.json

First, we've specified that we want the name of the dataset as it appears in dms-viz to be REGN mAb Cocktail (named after the Regeneron Antibody cocktail therapuetic for SARS-CoV-2). This isn't so crucial when there is only a single dataset; however, when combining multiple datasets with the join command, it's necessary to have unique and descriptive names.

Next, we've pointed to the input data containing quantitative scores that measure the degree of antibody escape from the REGN mAb Cocktail. For details on the specific requirements for input data, check out the Data Requirements guide in the documentation. In addition to specifying the input data, we told configure-dms-viz which column contains the escape scores (mut_escape) and what to call that column in the plots (Escape).

Then, we've specified a sitemap. This is optional information that describes how the sites in your input data correspond to your 3D protein structure. If you do not provide a sitemap, the sites in the input data are assumed to correspond one-to-one with the sites in the protein structure.

After that, we specified a protein structure. In this case, we're fetching 6XDG from the RSCB PDB and only showing our data on chain E of that structure.

Finally, in this particular dataset, we have multiple 'conditions' for each mutation; this means there are multiple measurements (mut_escape) for each mutation/position (corresponding to escape from different antibodies). We need to specify the column that contains these conditions. In dms-viz, an interactive legend will let you toggle between conditions.

The result of this command should be a message printed to the terminal providing some basic information from the configure-dms-viz format command that looks like this:

Output

Formatting data for visualization using the 'mut_escape' column from 'tests/SARS2-RBD-REGN-DMS/input/REGN_escape.csv'...

Using sitemap from 'tests/SARS2-RBD-REGN-DMS/sitemap/sitemap.csv'.

'protein_site' column is not present in the sitemap. Assuming that the reference sites correspond to protein sites.

About 95.98% of the wildtype residues in the data match the corresponding residues in the structure.
About 4.02% of the data sites are missing from the structure.

Success! The visualization JSON was written to './REGN_escape.json'

That's how you use configure-dms-viz to format a single dataset! You can also combine multiple datasets into a single .json specification file using the configure-dms-viz join command. For more details on combining datasets to jointly visualize with dms-viz, check out the API.

Developing

configure-dms-viz was developed using Python (>=3.9) and the click library.

To contribute to configure-dms-viz, follow the instructions here for setting up a development environment.

Testing

pytest is the testing framework for configure-dms-viz.

The command line interface (CLI) of configure-dms-viz is tested using four example datasets from different projects and labs that cover 100% of its flags and features. These four examples are:

  1. Deep mutational scanning of the SARS-CoV-2 Spike protein Authors: Bernadeta Dadonaite, Katharine H D Crawford, Caelan E Radford, Ariana G Farrell, Timothy C Yu, William W Hannon, Panpan Zhou, Raiees Andrabi, Dennis R Burton, Lihong Liu, David D. Ho, Richard A. Neher, Jesse D Bloom Manuscript: https://www.sciencedirect.com/science/article/pii/S0092867423001034?via%3Dihub

  2. Deep mutational scanning of the HIV BF520 strain Envelope protein Authors: Caelan E. Radford, Philipp Schommers, Lutz Gieselmann, Katharine H. D. Crawford, Bernadeta Dadonaite, Timothy C. Yu, Adam S. Dingens, Julie Overbaugh, Florian Klein, Jesse D. Bloom Manuscript: https://www.sciencedirect.com/science/article/pii/S1931312823002184?via%3Dihub

  3. Phylogenetic fitness estimates of every SARS-CoV-2 protein Authors: Jesse D. Bloom, Richard A. Neher Manuscript: https://www.biorxiv.org/content/10.1101/2023.01.30.526314v2

  4. Deep mutational scanning of the Influenza PB1 polymerse subunit Authors: Yuan Li, Sarah Arcos, Kimberly R. Sabsay, Aartjan J.W. te Velthuis, Adam S. Lauring Manuscript: https://www.biorxiv.org/content/10.1101/2023.08.27.554986v1.full

In addition to these test datasets, there are specific tests using dummy data for the key formatting functions. To run the tests, execute the following command from the root of the directory:

poetry run pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

configure_dms_viz-1.4.0.tar.gz (19.2 kB view details)

Uploaded Source

Built Distribution

configure_dms_viz-1.4.0-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file configure_dms_viz-1.4.0.tar.gz.

File metadata

  • Download URL: configure_dms_viz-1.4.0.tar.gz
  • Upload date:
  • Size: 19.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Linux/6.5.0-1021-azure

File hashes

Hashes for configure_dms_viz-1.4.0.tar.gz
Algorithm Hash digest
SHA256 6c095da73a296ec94d5274eb18c93362fd14c5d76a887f17b6016caca448b639
MD5 664f3797ab0f2975b8a63a7b762487b9
BLAKE2b-256 92a493072493d7d95b5f9009eeb7707569d7ab7eb46b95d2c07977eb5028f0f9

See more details on using hashes here.

File details

Details for the file configure_dms_viz-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: configure_dms_viz-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 17.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Linux/6.5.0-1021-azure

File hashes

Hashes for configure_dms_viz-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d98acb147ce274ac06426a6cbf1e645a59a3d5e989acdcf72689d52c93dbcf70
MD5 09cd8b0ff86cd320109b05bc027dabd3
BLAKE2b-256 e2ca4629e23243d66151d4f2a04633c006006c73c8abd3464081216938682823

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page