Skip to main content

Configure your data for visualization with dms-viz.github.io

Project description

configure_dms_viz

License Code style: black Ruff


configure_dms_viz is a python utility created by the Bloom Lab that configures your data for the web-based visualization tool dms-viz.

Table of Contents

Introduction

configure_dms_viz is a command-line tool designed to create a JSON file for the web-based visualization tool dms-viz. You can use dms-viz to visualize site-level mutation data in the context of a 3D protein structure. With configure_dms_viz, users can generate a compatible JSON file that can be uploaded to the dms-viz website for interactive analysis of their protein mutation data.

Installation

Before using configure_dms_viz, ensure that you have the following software installed:

  • python >=3.10
  • pip

You can use the python package manager pip to install configure_dms_viz like so:

pip install configure-dms-viz

You can check that the installation worked by running:

configure-dms-viz --help

Usage

To use configure_dms_viz, execute the configure-dms-viz command with the required and optional arguments as needed:

configure-dms-viz \
    --name <experiment_name> \
    --input <input_csv> \
    --sitemap <sitemap_csv> \
    --metric <metric_column> \
    --structure <pdb_structure> \
    --output <output_json> \
    [optional_arguments]

Arguments

Required arguments

  • --input : Path to a CSV file with site- and mutation-level data to visualize on a protein structure. See details below for required columns and format.
  • --name : Name of the experiment/selection for the tool. For example, the antibody name or serum ID. This property is necessary for combining multiple experiments into a single file.
  • --sitemap : Path to a CSV file containing a map between reference sites in the experiment and sequential sites. See details below for required columns and format.
  • --metric : Name of the column that contains the value to visualize on the protein structure. This tells the tool which column you want to visualize on a protein strucutre.
  • --structure : Either an RSCB PDB ID if using a structure that can be fetched directly from the PDB (i.e. "6xr8"). Or, a path to a locally downloaded PDB file (i.e. ./pdb/my_custom_structure.pdb).
  • --output : Path to save the *.json file containing the data for the visualization tool.

Optional configuration arguments

  • --condition : If there are multiple measurements per mutation, the name of the column that contains that condition distinguishing these measurements.
  • --metric-name : The name that will show up for your metric in the plot. This let's you customize the names of your columns in your visualization. For example, if your metric column is called escape_mean you can rename it to Escape for the visualization.
  • --conditon_name : The name that will show up for your condition column in the title of the plot legend. For example, if your condition column is 'epitope', you might rename it to be capilized as 'Epitope' in the legend title.
  • --join-data : A comma separated list of CSV file with data to join to the visualization data. This data can then be used in the visualization tooltips or filters. See details below for formatting requirements.
  • --tooltip-cols : A dictionary that establishes the columns that you want to show up in the tooltip in the visualization (i.e. "{'times_seen': '# Obsv', 'effect': 'Func Eff.'}").
  • --filter-cols : A dictionary that establishes the columns that you want to use as filters in the visualization (i.e. "{'effect': 'Functional Effect', 'times_seen': 'Times Seen'}").
  • --included-chains : A space-delimited string of chain names that correspond to the chains in your PDB structure that correspond to the reference sites in your data (i.e., 'C F M G J P'). This is only necesary if your PDB structure contains chains that you do not have site- and mutation-level measurements for.
  • --excluded-chains : A space-delimited string of chain names that should not be shown on the protein structure (i.e., 'B L R').
  • --alphabet : A string with no spaces containing all the amino acids in your experiment and their desired order (i.e. "RKHDEQNSTYWFAILMVGPC-*").
  • --colors : A comma separated list of HEX format colors for representing different epitopes, i.e. "#0072B2, #CC79A7, #4C3549, #009E73".
  • --check-pdb : Whether to perform checks on the provided pdb structure including checking if the 'included chains' are present, what % of data sites are missing, and what % of wildtype residues in the data match at corresponding sites in the structure.
  • --exclude-amino-acids : A comma separated list of amino acids that shouldn't be used to calculate the summary statistics (i.e. "*, -")
  • --description : A short description of the dataset that will show up in the tool if the user clicks a button for more information.
  • --title : A short title to appear above the plot.

Input Data Format

The main inputs for configure_dms_viz include the following example files located in the tests directory:

  1. An input CSV: Example CSV files containing site- and mutation-level data to visualize on a protein structure can be found in the tests/sars2/escape directory. The CSV must contain the following columns in addition to the specified metric_column:
    • site or reference_site: These will be the sites that show up on the x-axis of the visualization.
    • wildtype: The wildtype amino acid at a given reference site.
    • mutant: The mutant amino acid for a given measurement.
    • condition: Optionally, if there are multiple measurements for the same site (i.e. multiple epitopes), a unique string deliniating these measurements.
  2. A Sitemap: An example sitemap, which is a CSV file containing a map between reference sites on the protein and their sequential order, can be found at tests/sars2/site_numbering_map.
    • reference_site: This must correspond to the site or reference_site column in your input csv.
    • sequential_site: This is the sequential order of the reference sites and must be a numeric column.
    • protein_site: Optional, this column is only necessary if the reference_site sites are different from the sites in your PDB strucutre.
  3. Optional Join Data: An example dataframe that you could join with your data, if desired, is provided at tests/sars2/muteffects_observed.csv. The CSV is joined to your input CSV by the site, wildtype, and mutant columns.

Make sure your input data follows the same format as the provided examples to ensure compatibility with the configure_dms_viz tool.

Output Data Format

The output is a single JSON file per experiment that can be uploaded to dms-viz for visualizing. You can combine these into a single JSON file if you want to visualize mulitple experiments in the same session.

Examples

An example dataset is included within the tests directory of the repo. After installing the tool, you can run the following example:

configure-dms-viz \
   --name LyCoV-1404 \
   --input tests/sars2/escape/LyCoV-1404_avg.csv \
   --sitemap tests/sars2/site_numbering_map.csv \
   --metric escape_mean \
   --structure 6xr8 \
   --output LyCoV-1404.json \
   --metric-name Escape \
   --join-data tests/sars2/muteffects_observed.csv \
   --filter-cols "{'effect': 'Functional Effect', 'times_seen': 'Times Seen'}" \
   --tooltip-cols "{'times_seen': '# Obsv', 'effect': 'Func Eff.'}"

To an example of what this would look like applied over multiple datasets, look in the provided Snakefile. You can run this example pipeline using the following command from within the configure_dms_viz directory:

snakemake --cores 1

The output will be located in the tests directory in a folder called output. You can upload the example output into dms-viz.

Troubleshooting

If you have any questions formating your data or run into any issues with this tool, post a git issue in this repo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

configure_dms_viz-0.1.4.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

configure_dms_viz-0.1.4-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file configure_dms_viz-0.1.4.tar.gz.

File metadata

  • Download URL: configure_dms_viz-0.1.4.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.4 Linux/5.15.0-1042-azure

File hashes

Hashes for configure_dms_viz-0.1.4.tar.gz
Algorithm Hash digest
SHA256 3820595f9c22c31039d5c7e4f35fcfbf7e5d4ffb34cbc5e62a27e880b57dfb84
MD5 8d270ec0f4ea2edf1a7cebf39e51b7b9
BLAKE2b-256 fe63be1c8b5dc9973a4a122ac3be147b4073e27069bd6feb153404fecff44fc6

See more details on using hashes here.

File details

Details for the file configure_dms_viz-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: configure_dms_viz-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.4 Linux/5.15.0-1042-azure

File hashes

Hashes for configure_dms_viz-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 676760d42f83a8363f02a15cf313d534c6f3f3a217881e22a9d1e42aa2f94dd0
MD5 92418662a5f8c3d06f3e0ce0bb53f0a6
BLAKE2b-256 90275caec39ca895ec138a7ee01a34dd54951fa783c47f2e23f35e3c6e73bc80

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page