Skip to main content

Subfamily specific residue (ssr) detection and visualization toolbox

Project description

Install

SSR-viz in implemented as a standalone GUI framework. It is entirely written in Python 3 and therefore the easiest way to obtain it is trough PIP - the official python repository.

Additionally we implemented standalone executables for Windows (tested on windows 10) and Linux (tested on Ubunt 16.04). These a much bigger then the pure python module, but ship everything needed out of the Box.

The only external tool needed is mafft, an excellent alignment tool which is required to map protein structure indices to the alignment. SSR-viz runs without mafft, but for the Add_pdb tool the mafft executable needs to be assigned (see section [add_p]).

Getting started

The SSR-viz algorithm is based on a multiple sequence alignment (MSA) file in FASTA format, which can be generated with various tools, such as Clustalo and Mafft or with a Webserver such as ().

The topic of sequence alignment is beyond the scope of this manual. Nevertheless one should keep in mind that the quality of the alignment is crucial for the detection algorithm. (Is is difficult to interpret the importance of a position, which has more gaps then amino acids.)

The first step is the classification of the sequences into subfamilies. This is undoubtedly the most difficult part, as it often requires to identify the specific functionality based on scientific literature or even undertake experimental validation. Dedicated databases such as can help to identify detailed protein functionality.

Even though there are various tools available that can cluster protein sequences, these clustering methods always apply some kind of similarity scoring, which leads in most cases to a clustering based on evolutionary relationship rather then functional similarity. This is demonstrated on an example in section [example].

Ones you collected the class information of your sequences you can add them to your alignment. The CSV_Builder tool allows to creates a comma separated value (CSV) file which can be used to add the class label to the sequences (see section [csv_b]).

An alignment and the CSV file is everything thats needed to detect subfamily specific residues in the sequences. The SSR_plot tool handles the actual execution of the detection algorithm, the output can be a mathplotlib style plot (see section [ssr_p]) as pdf, a Javlview annotation file (which can show the results together with the alignment) as well as a ’stats.csv’ file which summarizes the SSRs.

In many cases it is desired to observe the SSRs inside a protein structure (if available). Therefore, we also developed a tool Add_pdb which allows to map the indices of a protein structure file (*.pdb) to the indices of the alignment in the ”stats.csv’ file (see section [add_p])

An overview chart which explains the setup of the three tools is shown in figure 

CSV builder {#csv_b}

The CSV_Builder handles the input and takes care, that the alignment and CSV class label file have the right formating.

The mapping scheme is shown in Fig. .

Arguments

Input sequence alignment file

:
The alignment file with the sequences of the family. The deisred format is in FASTA fromat (clustalo), see section [example] for an example.

Inplace FASTA conversion / Temporary alignment file name

:
The CSV_Builder routine will remove duplicates from the alignment, as multiple identical sequences will overestimate the importance of this subfamily. The alignment can be converted inplace, meaning the original alignment is overwritten or a new alignment can be created.

Regex extraction of the class label

:
The normal CSV_Builder routine will create a CSV file, with an empty column for the class labels. Which must be manually filled. In some cases the class label is part of the sequence names, this labels can be extracted using regular expressions (regex) patterns. The entire scope of regex is to big for this manual, but the set of examples in the appendix [appendix] should help to get started. There are various tools available which can be used to test regex before usage, most text editors support regex as a search option. You can for example load the alignment file into sublime or notepad, then search with regex and if the pattern is correct, it should only highlight the desired class label. An example is shown in appendix [appendix].

Output

:
The name of the CSV file, by default it will be created in the same folder as the sequence file.

Delete

:
Allows to overwrite existing CSV files.

SSR plot {#ssr_p}

The

Algorithm

Arguments

Output

Add pdb {#add_p}

Arguments

Example

Appendix

Regex examples

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

ssrviz-0.1.2.10-py3-none-any.whl (61.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page