Configure your data for visualization with dms-viz.github.io

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

`configure_dms_viz`

License

configure_dms_viz is a python utility created by the Bloom Lab that configures your data for the web-based visualization tool dms-viz.

Introduction
Prerequisites
Installation
Usage
Input Data Format
Output Data Format
Examples
Troubleshooting

Introduction

configure_dms_viz is a command-line tool designed to create a JSON file for the web-based visualization tool dms-viz. You can use dms-viz to visualize site-level mutation data in the context of a 3D protein structure. With configure_dms_viz, users can generate a compatible JSON file that can be uploaded to the dms-viz website for interactive analysis of their protein mutation data.

Installation

Before using configure_dms_viz, ensure that you have the following software installed:

python >=3.10
pip

You can use the python package manager pip to install configure_dms_viz like so:

pip install configure-dms-viz

You can check that the installation worked by running:

configure-dms-viz --help

Usage

To use configure_dms_viz, execute the configure-dms-viz command with the required and optional arguments as needed:

configure-dms-viz \
    --name <experiment_name> \
    --input <input_csv> \
    --sitemap <sitemap_csv> \
    --metric <metric_column> \
    --structure <pdb_structure> \
    --output <output_json> \
    [optional_arguments]

Arguments

Required arguments

--input : Path to a CSV file with site- and mutation-level data to visualize on a protein structure. See details below for required columns and format.
--name : Name of the experiment/selection for the tool. For example, the antibody name or serum ID. This property is necessary for combining multiple experiments into a single file.
--sitemap : Path to a CSV file containing a map between reference sites in the experiment and sequential sites. See details below for required columns and format.
--metric : Name of the column that contains the value to visualize on the protein structure. This tells the tool which column you want to visualize on a protein strucutre.
--structure : Either an RSCB PDB ID if using a structure that can be fetched directly from the PDB (i.e. "6xr8"). Or, a path to a locally downloaded PDB file (i.e. ./pdb/my_custom_structure.pdb).
--output : Path to save the *.json file containing the data for the visualization tool.

Optional configuration arguments

--condition : If there are multiple measurements per mutation, the name of the column that contains that condition distinguishing these measurements.
--metric-name : The name that will show up for your metric in the plot. This let's you customize the names of your columns in your visualization. For example, if your metric column is called escape_mean you can rename it to Escape for the visualization.
--conditon_name : The name that will show up for your condition column in the title of the plot legend. For example, if your condition column is 'epitope', you might rename it to be capilized as 'Epitope' in the legend title.
--join-data : A comma separated list of CSV file with data to join to the visualization data. This data can then be used in the visualization tooltips or filters. See details below for formatting requirements.
--tooltip-cols : A dictionary that establishes the columns that you want to show up in the tooltip in the visualization (i.e. "{'times_seen': '# Obsv', 'effect': 'Func Eff.'}").
--filter-cols : A dictionary that establishes the columns that you want to use as filters in the visualization (i.e. "{'effect': 'Functional Effect', 'times_seen': 'Times Seen'}").
--filter-limits : A dictionary that establishes the range for each filter (i.e. "{'effect': [min, max]), 'times_seen': [min, max]}").
--included-chains : A space-delimited string of chain names that correspond to the chains in your PDB structure that correspond to the reference sites in your data (i.e., 'C F M G J P'). This is only necesary if your PDB structure contains chains that you do not have site- and mutation-level measurements for.
--excluded-chains : A space-delimited string of chain names that should not be shown on the protein structure (i.e., 'B L R').
--alphabet : A string with no spaces containing all the amino acids in your experiment and their desired order (i.e. "RKHDEQNSTYWFAILMVGPC-*").
--colors : A comma separated list of HEX format colors for representing different conditions, i.e. "#0072B2, #CC79A7, #4C3549, #009E73".
--negative-colors : A comma separated list of HEX format colors for representing the negative end of the scale for different conditions, i.e. "#0072B2, #CC79A7, #4C3549, #009E73". If not provided, the inverse of each color is automatically calculated.
--check-pdb : Whether to perform checks on the provided pdb structure including checking if the 'included chains' are present, what % of data sites are missing, and what % of wildtype residues in the data match at corresponding sites in the structure.
--exclude-amino-acids : A comma separated list of amino acids that shouldn't be used to calculate the summary statistics (i.e. "*, -")
--description : A short description of the dataset that will show up in the tool if the user clicks a button for more information.
--title : A short title to appear above the plot.

Input Data Format

The main inputs for configure_dms_viz include the following example files located in the tests directory:

An input CSV: Example CSV files containing site- and mutation-level data to visualize on a protein structure can be found in the tests/sars2/escape directory. The CSV must contain the following columns in addition to the specified metric_column:
- site or reference_site: These will be the sites that show up on the x-axis of the visualization.
- wildtype: The wildtype amino acid at a given reference site.
- mutant: The mutant amino acid for a given measurement.
- condition: Optionally, if there are multiple measurements for the same site (i.e. multiple epitopes), a unique string deliniating these measurements.
A Sitemap: An example sitemap, which is a CSV file containing a map between reference sites on the protein and their sequential order, can be found at tests/sars2/site_numbering_map.
- reference_site: This must correspond to the site or reference_site column in your input csv.
- sequential_site: This is the sequential order of the reference sites and must be a numeric column.
- protein_site: Optional, this column is only necessary if the reference_site sites are different from the sites in your PDB strucutre.
Optional Join Data: An example dataframe that you could join with your data, if desired, is provided at tests/sars2/muteffects_observed.csv. The CSV is joined to your input CSV by the site, wildtype, and mutant columns.

Make sure your input data follows the same format as the provided examples to ensure compatibility with the configure_dms_viz tool.

Output Data Format

The output is a single JSON file per experiment that can be uploaded to dms-viz for visualizing. You can combine these into a single JSON file if you want to visualize mulitple experiments in the same session.

Examples

An example dataset is included within the tests directory of the repo. After installing the tool, you can run the following example:

configure-dms-viz \
   --name LyCoV-1404 \
   --input tests/sars2/escape/LyCoV-1404_avg.csv \
   --sitemap tests/sars2/site_numbering_map.csv \
   --metric escape_mean \
   --structure 6xr8 \
   --output LyCoV-1404.json \
   --metric-name Escape \
   --join-data tests/sars2/muteffects_observed.csv \
   --filter-cols "{'effect': 'Functional Effect', 'times_seen': 'Times Seen'}" \
   --tooltip-cols "{'times_seen': '# Obsv', 'effect': 'Func Eff.'}"

To an example of what this would look like applied over multiple datasets, look in the provided Snakefile. You can run this example pipeline using the following command from within the configure_dms_viz directory:

snakemake --cores 1

The output will be located in the tests directory in a folder called output. You can upload the example output into dms-viz.

Troubleshooting

If you have any questions formating your data or run into any issues with this tool, post a git issue in this repo.

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.4.0

May 17, 2024

1.3.4

May 15, 2024

1.3.3

Apr 24, 2024

1.3.2

Apr 5, 2024

1.2.2

Mar 29, 2024

1.2.1

Mar 1, 2024

1.2.0

Feb 13, 2024

1.1.1

Nov 3, 2023

1.1.0

Oct 30, 2023

1.0.0

Oct 20, 2023

0.3.3

Oct 11, 2023

0.3.2

Sep 18, 2023

0.3.1

Sep 5, 2023

This version

0.3.0

Aug 29, 2023

0.2.0

Aug 28, 2023

0.1.4

Aug 15, 2023

0.1.3

Aug 2, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

configure_dms_viz-0.3.0.tar.gz (16.4 kB view hashes)

Uploaded Aug 29, 2023 Source

Built Distribution

configure_dms_viz-0.3.0-py3-none-any.whl (15.1 kB view hashes)

Uploaded Aug 29, 2023 Python 3

Hashes for configure_dms_viz-0.3.0.tar.gz

Hashes for configure_dms_viz-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`1e1ff6fd81ffc7322a89a30e57660c066a554957a990834a55e9c2abc444d7e0`
MD5	`0dc87dd3cc10f80df6fbd7e0370b8597`
BLAKE2b-256	`581b8b696dc982db19062c1f8d87f4962064abf72108d0f5f992f174285b59dc`

Hashes for configure_dms_viz-0.3.0-py3-none-any.whl

Hashes for configure_dms_viz-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c160b558eef49d8f620b20ca4e31c0a30a5e18e7a49d5a4f731fb46a639d0f26`
MD5	`bb36b33e6b54c0d37b6a45355c35c4fe`
BLAKE2b-256	`0d5c4ddce6a1fd3303e5af122ee7e4e0f1ff864e5d3275500f51d2b56f15f358`