Automatic prediction and classification of protein domain architectures
Project description
synthaser
Process
synthaser
parses the results of a batch NCBI conserved domain search and determines
the domain architecture of secondary metabolite synthases.
Installation
Install from PyPI using pip:
$ pip install --user synthaser
or clone the repo and install locally:
$ git clone https://www.github.com/gamcil/synthaser
$ cd synthaser
$ pip install .
Dependencies
synthaser
is written in pure Python (3.6+), and requires only the following dependencies for
remote searches:
requests
, for interaction with the NCBI's CD-Search APIbiopython
, for retrieving sequences from NCBI Entrez
If you want to do local searches, you'll need:
RPS-BLAST
, for performing local domain searchesrpsbproc
, for formatting RPS-BLAST results like CD-Search
These can be obtained from the NCBI FTP.
Usage
A full synthaser
search can be performed as simply as:
$ synthaser -qf sequences.fasta
Where sequences.fasta
is a FASTA format file containing the protein sequences
that you would like to search.
For a full listing of available arguments, enter:
$ synthaser -h
Visualising your results
synthaser
is capable of generating fully-interactive, annotated visualisations
so you can easily explore your results. All that is required is one
extra argument:
$ synthaser -qf sequences.fasta -p
This will generate a figure like so:
Click here to play around with the full version of this example.
Saving your search session
synthaser
allows you to save your search results such that they can be easily
reloaded for further visualisation or exploration without having to fully re-do
the search.
To do this, use the --json_file
command:
$ synthaser -qf sequences.fasta --json_file sequences.json
This will save all of your results, in JSON format, to the file
sequences.json
. Then, loading this session back into synthaser
, is as easy
as:
$ synthaser --json_file sequences.json ...
Using your own rules
Though synthaser
was originally designed to analyse secondary metabolite synthases,
it can easily be repurposed to analyse the domain architectures of any type of protein sequence.
Under the hood, synthaser
uses a central rule file which contains:
- Domain types, containing specific families to save in CD-Search results, corresponding to domain 'islands';
- Rules for classifying the sequences based on domain architecture predictions; and
- A hierarchy which determines the order of evaluation for the rules.
We distribute our fungal megasynthase rule file as the default, but providing your own rule file is as simple as:
$ synthaser -qf sequences.fasta --rule_file my_rules.json
We also provide a web application for assembling your own rule files, which can be found here.
For a detailed explanation of how the rule file works, as well as API documentation, please refer to the documentation.
Citations
If you found synthaser
helpful, please cite:
1. <pending>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file synthaser-1.1.8.tar.gz
.
File metadata
- Download URL: synthaser-1.1.8.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.3.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8ce6b3feca7ef8204fdc1229783a6ad68da89b3cce1b361eb07b716201d92b16 |
|
MD5 | 50df7b805c4d0f1d06a4f9a2408712fb |
|
BLAKE2b-256 | 36d17fa727641ba8b2ee8ea6b9beaeb46ad0b6d6de469568203503ab7d25d03c |
File details
Details for the file synthaser-1.1.8-py3-none-any.whl
.
File metadata
- Download URL: synthaser-1.1.8-py3-none-any.whl
- Upload date:
- Size: 137.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.3.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3b82638a630a447166b259f9b5ea2094a7e5900c29955754a44ca043a981078 |
|
MD5 | 1c067abbab51c3d465fc3f0acef4201f |
|
BLAKE2b-256 | bed6096fdcf126594d675c8905cd2ac06ec5e50122c08fca9d0831bcc0d6a9fc |