Configure your data for visualization with dms-viz.github.io
Project description
configure_dms_viz
configure_dms_viz is a python utility created by the Bloom Lab that configures your data for the web-based visualization tool dms-viz.
Table of Contents
- Introduction
- Prerequisites
- Installation
- Usage
- Input Data Format
- Output Data Format
- Examples
- Troubleshooting
Introduction
configure_dms_viz is a command-line tool designed to create a JSON file for the web-based visualization tool dms-viz. You can use dms-viz to visualize site-level mutation data in the context of a 3D protein structure. With configure_dms_viz, users can generate a compatible JSON file that can be uploaded to the dms-viz website for interactive analysis of their protein mutation data.
Installation
Before using configure_dms_viz, ensure that you have the following software installed:
- python >=3.10
- pip
You can use the python package manager pip to install configure_dms_viz like so:
pip install configure-dms-viz
You can check that the installation worked by running:
configure-dms-viz --help
Usage
To use configure_dms_viz, execute the configure-dms-viz command with the required and optional arguments as needed:
configure-dms-viz \
--name <experiment_name> \
--input <input_csv> \
--sitemap <sitemap_csv> \
--metric <metric_column> \
--structure <pdb_structure> \
--output <output_json> \
[optional_arguments]
Arguments
Required arguments
--input: Path to a CSV file with site- and mutation-level data to visualize on a protein structure. See details below for required columns and format.--name: Name of the experiment/selection for the tool. For example, the antibody name or serum ID. This property is necessary for combining multiple experiments into a single file.--sitemap: Path to a CSV file containing a map between reference sites in the experiment and sequential sites. See details below for required columns and format.--metric: Name of the column that contains the value to visualize on the protein structure. This tells the tool which column you want to visualize on a protein strucutre.--structure: Either an RSCB PDB ID if using a structure that can be fetched directly from the PDB (i.e."6xr8"). Or, a path to a locally downloaded PDB file (i.e../pdb/my_custom_structure.pdb).--output: Path to save the *.json file containing the data for the visualization tool.
Optional configuration arguments
--condition: If there are multiple measurements per mutation, the name of the column that contains that condition distinguishing these measurements.--metric-name: The name that will show up for your metric in the plot. This let's you customize the names of your columns in your visualization. For example, if your metric column is calledescape_meanyou can rename it toEscapefor the visualization.--conditon_name: The name that will show up for your condition column in the title of the plot legend. For example, if your condition column is 'epitope', you might rename it to be capilized as 'Epitope' in the legend title.--join-data: A comma separated list of CSV file with data to join to the visualization data. This data can then be used in the visualization tooltips or filters. See details below for formatting requirements.--tooltip-cols: A dictionary that establishes the columns that you want to show up in the tooltip in the visualization (i.e."{'times_seen': '# Obsv', 'effect': 'Func Eff.'}").--filter-cols: A dictionary that establishes the columns that you want to use as filters in the visualization (i.e."{'effect': 'Functional Effect', 'times_seen': 'Times Seen'}").--included-chains: A space-delimited string of chain names that correspond to the chains in your PDB structure that correspond to the reference sites in your data (i.e.,'C F M G J P'). This is only necesary if your PDB structure contains chains that you do not have site- and mutation-level measurements for.--excluded-chains: A space-delimited string of chain names that should not be shown on the protein structure (i.e.,'B L R').--alphabet: A string with no spaces containing all the amino acids in your experiment and their desired order (i.e."RKHDEQNSTYWFAILMVGPC-*").--colors: A comma separated list of HEX format colors for representing different epitopes, i.e."#0072B2, #CC79A7, #4C3549, #009E73".--check-pdb: Whether to perform checks on the provided pdb structure including checking if the 'included chains' are present, what % of data sites are missing, and what % of wildtype residues in the data match at corresponding sites in the structure.--exclude-amino-acids: A comma separated list of amino acids that shouldn't be used to calculate the summary statistics (i.e. "*, -")--description: A short description of the dataset that will show up in the tool if the user clicks a button for more information.--title: A short title to appear above the plot.
Input Data Format
The main inputs for configure_dms_viz include the following example files located in the tests directory:
- An input CSV: Example CSV files containing site- and mutation-level data to visualize on a protein structure can be found in the
tests/sars2/escapedirectory. The CSV must contain the following columns in addition to the specifiedmetric_column:siteorreference_site: These will be the sites that show up on the x-axis of the visualization.wildtype: The wildtype amino acid at a given reference site.mutant: The mutant amino acid for a given measurement.condition: Optionally, if there are multiple measurements for the same site (i.e. multiple epitopes), a unique string deliniating these measurements.
- A Sitemap: An example sitemap, which is a CSV file containing a map between reference sites on the protein and their sequential order, can be found at
tests/sars2/site_numbering_map.reference_site: This must correspond to thesiteorreference_sitecolumn in yourinput csv.sequential_site: This is the sequential order of the reference sites and must be a numeric column.protein_site: Optional, this column is only necessary if thereference_sitesites are different from the sites in your PDB strucutre.
- Optional Join Data: An example dataframe that you could join with your data, if desired, is provided at
tests/sars2/muteffects_observed.csv. The CSV is joined to your input CSV by thesite,wildtype, andmutantcolumns.
Make sure your input data follows the same format as the provided examples to ensure compatibility with the configure_dms_viz tool.
Output Data Format
The output is a single JSON file per experiment that can be uploaded to dms-viz for visualizing. You can combine these into a single JSON file if you want to visualize mulitple experiments in the same session.
Examples
An example dataset is included within the tests directory of the repo. After installing the tool, you can run the following example:
configure-dms-viz \
--name LyCoV-1404 \
--input tests/sars2/escape/LyCoV-1404_avg.csv \
--sitemap tests/sars2/site_numbering_map.csv \
--metric escape_mean \
--structure 6xr8 \
--output LyCoV-1404.json \
--metric-name Escape \
--join-data tests/sars2/muteffects_observed.csv \
--filter-cols "{'effect': 'Functional Effect', 'times_seen': 'Times Seen'}" \
--tooltip-cols "{'times_seen': '# Obsv', 'effect': 'Func Eff.'}"
To an example of what this would look like applied over multiple datasets, look in the provided Snakefile. You can run this example pipeline using the following command from within the configure_dms_viz directory:
snakemake --cores 1
The output will be located in the tests directory in a folder called output. You can upload the example output into dms-viz.
Troubleshooting
If you have any questions formating your data or run into any issues with this tool, post a git issue in this repo.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file configure_dms_viz-0.1.4.tar.gz.
File metadata
- Download URL: configure_dms_viz-0.1.4.tar.gz
- Upload date:
- Size: 15.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.11.4 Linux/5.15.0-1042-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3820595f9c22c31039d5c7e4f35fcfbf7e5d4ffb34cbc5e62a27e880b57dfb84
|
|
| MD5 |
8d270ec0f4ea2edf1a7cebf39e51b7b9
|
|
| BLAKE2b-256 |
fe63be1c8b5dc9973a4a122ac3be147b4073e27069bd6feb153404fecff44fc6
|
File details
Details for the file configure_dms_viz-0.1.4-py3-none-any.whl.
File metadata
- Download URL: configure_dms_viz-0.1.4-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.11.4 Linux/5.15.0-1042-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
676760d42f83a8363f02a15cf313d534c6f3f3a217881e22a9d1e42aa2f94dd0
|
|
| MD5 |
92418662a5f8c3d06f3e0ce0bb53f0a6
|
|
| BLAKE2b-256 |
90275caec39ca895ec138a7ee01a34dd54951fa783c47f2e23f35e3c6e73bc80
|