Configure your data for visualization with dms-viz.github.io
Project description
configure_dms_viz
configure_dms_viz
is a python utility created by the Bloom Lab that configures your data for the web-based visualization tool dms-viz.
Table of Contents
- Introduction
- Prerequisites
- Installation
- Usage
- Input Data Format
- Output Data Format
- Examples
- Troubleshooting
Introduction
configure_dms_viz
is a command-line tool designed to create a JSON file for the web-based visualization tool dms-viz
. You can use dms-viz
to visualize site-level mutation data in the context of a 3D protein structure. With configure_dms_viz
, users can generate a compatible JSON file that can be uploaded to the dms-viz
website for interactive analysis of their protein mutation data.
Installation
Before using configure_dms_viz
, ensure that you have the following software installed:
- python >=3.10
- pip
You can use the python package manager pip
to install configure_dms_viz
like so:
pip install configure-dms-viz
You can check that the installation worked by running:
configure-dms-viz --help
Usage
To use configure_dms_viz
, execute the configure-dms-viz
command with the required and optional arguments as needed:
configure-dms-viz \
--name <experiment_name> \
--input <input_csv> \
--sitemap <sitemap_csv> \
--metric <metric_column> \
--structure <pdb_structure> \
--output <output_json> \
[optional_arguments]
Arguments
Required arguments
--input
: Path to a CSV file with site- and mutation-level data to visualize on a protein structure. See details below for required columns and format.--name
: Name of the experiment/selection for the tool. For example, the antibody name or serum ID. This property is necessary for combining multiple experiments into a single file.--sitemap
: Path to a CSV file containing a map between reference sites in the experiment and sequential sites. See details below for required columns and format.--metric
: Name of the column that contains the value to visualize on the protein structure. This tells the tool which column you want to visualize on a protein strucutre.--structure
: Either an RSCB PDB ID if using a structure that can be fetched directly from the PDB (i.e."6xr8"
). Or, a path to a locally downloaded PDB file (i.e../pdb/my_custom_structure.pdb
).--output
: Path to save the *.json file containing the data for the visualization tool.
Optional configuration arguments
--condition
: If there are multiple measurements per mutation, the name of the column that contains that condition distinguishing these measurements.--metric-name
: The name that will show up for your metric in the plot. This let's you customize the names of your columns in your visualization. For example, if your metric column is calledescape_mean
you can rename it toEscape
for the visualization.--conditon_name
: The name that will show up for your condition column in the title of the plot legend. For example, if your condition column is 'epitope', you might rename it to be capilized as 'Epitope' in the legend title.--join-data
: A comma separated list of CSV file with data to join to the visualization data. This data can then be used in the visualization tooltips or filters. See details below for formatting requirements.--tooltip-cols
: A dictionary that establishes the columns that you want to show up in the tooltip in the visualization (i.e."{'times_seen': '# Obsv', 'effect': 'Func Eff.'}"
).--filter-cols
: A dictionary that establishes the columns that you want to use as filters in the visualization (i.e."{'effect': 'Functional Effect', 'times_seen': 'Times Seen'}"
).--included-chains
: A space-delimited string of chain names that correspond to the chains in your PDB structure that correspond to the reference sites in your data (i.e.,'C F M G J P'
). This is only necesary if your PDB structure contains chains that you do not have site- and mutation-level measurements for.--excluded-chains
: A space-delimited string of chain names that should not be shown on the protein structure (i.e.,'B L R'
).--alphabet
: A string with no spaces containing all the amino acids in your experiment and their desired order (i.e."RKHDEQNSTYWFAILMVGPC-*"
).--colors
: A comma separated list of HEX format colors for representing different epitopes, i.e."#0072B2, #CC79A7, #4C3549, #009E73"
.--check-pdb
: Whether to perform checks on the provided pdb structure including checking if the 'included chains' are present, what % of data sites are missing, and what % of wildtype residues in the data match at corresponding sites in the structure.--exclude-amino-acids
: A comma separated list of amino acids that shouldn't be used to calculate the summary statistics (i.e. "*, -")--description
: A short description of the dataset that will show up in the tool if the user clicks a button for more information.--title
: A short title to appear above the plot.
Input Data Format
The main inputs for configure_dms_viz
include the following example files located in the tests directory:
- An input CSV: Example CSV files containing site- and mutation-level data to visualize on a protein structure can be found in the
tests/sars2/escape
directory. The CSV must contain the following columns in addition to the specifiedmetric_column
:site
orreference_site
: These will be the sites that show up on the x-axis of the visualization.wildtype
: The wildtype amino acid at a given reference site.mutant
: The mutant amino acid for a given measurement.condition
: Optionally, if there are multiple measurements for the same site (i.e. multiple epitopes), a unique string deliniating these measurements.
- A Sitemap: An example sitemap, which is a CSV file containing a map between reference sites on the protein and their sequential order, can be found at
tests/sars2/site_numbering_map
.reference_site
: This must correspond to thesite
orreference_site
column in yourinput csv
.sequential_site
: This is the sequential order of the reference sites and must be a numeric column.protein_site
: Optional, this column is only necessary if thereference_site
sites are different from the sites in your PDB strucutre.
- Optional Join Data: An example dataframe that you could join with your data, if desired, is provided at
tests/sars2/muteffects_observed.csv
. The CSV is joined to your input CSV by thesite
,wildtype
, andmutant
columns.
Make sure your input data follows the same format as the provided examples to ensure compatibility with the configure_dms_viz
tool.
Output Data Format
The output is a single JSON file per experiment that can be uploaded to dms-viz for visualizing. You can combine these into a single JSON file if you want to visualize mulitple experiments in the same session.
Examples
An example dataset is included within the tests
directory of the repo. After installing the tool, you can run the following example:
configure-dms-viz \
--name LyCoV-1404 \
--input tests/sars2/escape/LyCoV-1404_avg.csv \
--sitemap tests/sars2/site_numbering_map.csv \
--metric escape_mean \
--structure 6xr8 \
--output LyCoV-1404.json \
--metric-name Escape \
--join-data tests/sars2/muteffects_observed.csv \
--filter-cols "{'effect': 'Functional Effect', 'times_seen': 'Times Seen'}" \
--tooltip-cols "{'times_seen': '# Obsv', 'effect': 'Func Eff.'}"
To an example of what this would look like applied over multiple datasets, look in the provided Snakefile
. You can run this example pipeline using the following command from within the configure_dms_viz
directory:
snakemake --cores 1
The output will be located in the tests directory in a folder called output
. You can upload the example output into dms-viz
.
Troubleshooting
If you have any questions formating your data or run into any issues with this tool, post a git issue in this repo.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file configure_dms_viz-0.1.4.tar.gz
.
File metadata
- Download URL: configure_dms_viz-0.1.4.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.11.4 Linux/5.15.0-1042-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3820595f9c22c31039d5c7e4f35fcfbf7e5d4ffb34cbc5e62a27e880b57dfb84 |
|
MD5 | 8d270ec0f4ea2edf1a7cebf39e51b7b9 |
|
BLAKE2b-256 | fe63be1c8b5dc9973a4a122ac3be147b4073e27069bd6feb153404fecff44fc6 |
File details
Details for the file configure_dms_viz-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: configure_dms_viz-0.1.4-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.11.4 Linux/5.15.0-1042-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 676760d42f83a8363f02a15cf313d534c6f3f3a217881e22a9d1e42aa2f94dd0 |
|
MD5 | 92418662a5f8c3d06f3e0ce0bb53f0a6 |
|
BLAKE2b-256 | 90275caec39ca895ec138a7ee01a34dd54951fa783c47f2e23f35e3c6e73bc80 |