A command-line tool that analyses the diversity and motifs of protein sequences
Project description
DiMA - Diversity Motif Analyser
Table of Contents
What is DiMA?
Protein sequence diversity is one of the major challenges in the design of diagnostic, prophylactic and therapeutic interventions against viruses. DiMA is a tool designed to facilitate the dissection of protein sequence diversity dynamics for viruses. DiMA provides a quantitative measure of sequence diversity by use of Shannon’s entropy, applied via a user-defined k-mer sliding window. Further, the entropy value is corrected for sample size bias by applying a statistical adjustment. Additionally, DiMA further interrogates the diversity by dissecting the entropy value at each k-mer position to various diversity motifs. The distinct k-mer sequences at each position are classified into the following motifs based on their incidence.
- Index: The predominant sequence.
- Major: The sequence with the second highest incidence after the Index.
- Minor: Kmers with incidence in between major and unique motifs
- Unique: Kmers which are only seen once in a particular kmer position.
Moreover, the description line of the sequences in the alignment can be formatted for inclusion of meta-data that can be tagged to the diversity motifs. DiMA enables comparative diversity dynamics analysis, within and between proteins of a virus species, and proteomes of different viral species.
Installation
pip install dima-cli
Basic Usage
Shell Command
dima-cli -i aligned_sequences.afa -o results.json
Python
from dima import Dima
results = Dima(sequences="aligned_sequences.afa").run()
Results
{
"sequence_count":203,
"support_threshold":30,
"low_support_count":15,
"protein_name":"Unknown Protein",
"kmer_length":9,
"results":[
{
"position":1,
"low_support":false,
"entropy":0.8383740426713246,
"support":124,
"distinct_variants_count":4,
"distinct_variants_incidence":3.2258062,
"variants":[
{
"sequence":"MKTIIALSC",
"count":2,
"incidence":1.6129031,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":null
},
{
"sequence":"MKTIIALSH",
"count":3,
"incidence":2.4193547,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":null
},
{
"sequence":"METISLISM",
"count":1,
"incidence":0.80645156,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"MKNIIALSY",
"count":13,
"incidence":10.4838705,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"MKTIIALSY",
"count":105,
"incidence":84.67742,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
}
]
}
]
}
Advance Usage
Shell Command
dima-cli -i aligned_sequences.afa -o results.json -f "accession|strain|country|date"
Python
from dima import Dima
results = Dima(sequences="aligned_sequences.afa", header_format="accession|strain|country|date").run()
Results
{
"sequence_count":203,
"support_threshold":30,
"low_support_count":15,
"protein_name":"Unknown Protein",
"kmer_length":9,
"results":[
{
"position":1,
"low_support":false,
"entropy":0.8361476856397749,
"support":124,
"distinct_variants_count":4,
"distinct_variants_incidence":3.2258062,
"variants":[
{
"sequence":"MKNIIALSY",
"count":13,
"incidence":10.4838705,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":{
"strain":[
"A/India/Pun_1922030/2019",
"A/India/Pun_1922292/2019",
"A/India/Pun_1921693/2019",
"A/India/Pun_1922218/2019",
"A/India/Pun_1922278/2019",
"A/India/Pun_1924667/2019",
"A/India/Pun_1923708/2019",
"A/India/Pun_1921994/2019",
"A/India/Pun_1922260/2019",
"A/India/Pun_1922016/2019",
"A/India/Pun_1923690/2019",
"A/India/Pun_1922295/2019",
"A/India/Pun_1923665/2019"
],
"country":[
"India",
"India",
"India",
"India",
"India",
"India",
"India",
"India",
"India",
"India",
"India",
"India",
"India"
],
"accession":[
"MN955496",
"MN955492",
"MN955499",
"MN955502",
"MN955493",
"MN955488",
"MN955487",
"MN955498",
"MN955494",
"MN955497",
"MN955489",
"MN955491",
"MN955490"
],
"date":[
"08/04/2019",
"08/19/2019",
"07/17/2019",
"08/09/2019",
"08/18/2019",
"08/01/2019",
"09/07/2019",
"07/26/2019",
"08/16/2019",
"07/30/2019",
"08/31/2019",
"08/20/2019",
"09/01/2019"
]
}
},
{
"sequence":"MKTIIALSY",
"count":105,
"incidence":84.67742,
"motif_short":"I",
"motif_long":"Index",
"metadata":{
"date":[
"01/02/2019",
"02/17/2019",
"01/14/2019",
"02/17/2019",
"01/17/2019",
"03/14/2019",
"02/13/2019",
"01/02/2019",
"02/06/2019",
"01/18/2019",
"10/11/2019",
"11/15/2019",
"01/10/2019",
"01/17/2019",
"01/17/2019",
"01/24/2019",
"02/01/2019",
"02/01/2019",
"02/01/2019",
"02/14/2019",
"03/14/2019",
"07/25/2019",
"08/21/2019",
"09/05/2019",
"09/05/2019",
"03/2019",
"02/2019",
"01/28/2019",
"10/08/2019",
"03/27/2019",
"02/14/2019",
"03/14/2019",
"01/15/2019",
"01/19/2019",
"01/28/2019",
"01/18/2019",
"02/14/2019",
"01/04/2019",
"01/08/2019",
"01/07/2019",
"01/28/2019",
"01/2019",
"01/10/2019",
"01/11/2019",
"01/13/2019",
"01/24/2019",
"01/08/2019",
"01/09/2019",
"01/14/2019",
"01/10/2019",
"02/2019",
"01/2019",
"09/05/2019",
"01/19/2019",
"01/03/2019",
"01/23/2019",
"02/01/2019",
"02/21/2019",
"02/28/2019",
"02/05/2019",
"01/07/2019",
"01/08/2019",
"01/08/2019",
"01/28/2019",
"01/28/2019",
"01/29/2019",
"01/29/2019",
"01/29/2019",
"01/30/2019",
"01/30/2019",
"01/30/2019",
"01/31/2019",
"01/31/2019",
"01/31/2019",
"01/09/2019",
"02/27/2019",
"03/05/2019",
"03/05/2019",
"03/05/2019",
"03/05/2019",
"03/08/2019",
"03/08/2019",
"03/04/2019",
"03/12/2019",
"01/05/2019",
"01/28/2019",
"01/29/2019",
"01/31/2019",
"02/22/2019",
"03/05/2019",
"01/23/2019",
"02/19/2019",
"04/14/2019",
"01/17/2019",
"04/04/2019",
"02/01/2019",
"02/01/2019",
"03/21/2019",
"05/24/2019",
"08/13/2019",
"08/05/2019",
"01/08/2019",
"01/14/2019",
"01/21/2019",
"01/12/2019"
],
"country":[
"Iran",
"Iran",
"Turkey",
"Iran",
"China",
"China",
"India",
"Iran",
"India",
"Japan",
"China",
"China",
"China",
"China",
"China",
"China",
"China",
"China",
"China",
"China",
"China",
"China",
"China",
"China",
"China",
"China",
"China",
"Japan",
"China",
"Japan",
"China",
"China",
"Japan",
"Japan",
"Japan",
"South_Korea",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"China",
"Japan",
"Japan",
"Japan",
"Japan",
"South_Korea",
"South_Korea",
"South_Korea",
"South_Korea",
"China",
"China",
"China",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"Japan",
"South_Korea",
"South_Korea",
"South_Korea",
"South_Korea",
"South_Korea",
"South_Korea",
"South_Korea",
"South_Korea",
"South_Korea",
"South_Korea",
"Japan",
"Japan",
"Japan",
"Japan",
"South_Korea",
"South_Korea",
"Japan",
"South_Korea",
"South_Korea",
"Japan",
"South_Korea",
"China",
"China",
"China",
"India",
"India",
"India",
"South_Korea",
"South_Korea",
"South_Korea",
"South_Korea"
],
"accession":[
"MK592790",
"MK648247",
"MK840323",
"MK648248",
"MT102500",
"MT102510",
"MK592841",
"MK592791",
"MK592842",
"MK785815",
"MT102520",
"MT102521",
"MT102498",
"MT102499",
"MT102501",
"MT102502",
"MT102504",
"MT102506",
"MT102507",
"MT102508",
"MT102512",
"MT102514",
"MT102515",
"MT102516",
"MT102517",
"MN594842",
"MN594840",
"MK869211",
"MT102519",
"MN074410",
"MT102509",
"MT102511",
"MK785807",
"MK785831",
"MK633673",
"MK763864",
"MN873980",
"MK905306",
"MK633641",
"MK576906",
"MK633649",
"MN594838",
"MK743434",
"MK743442",
"MK743450",
"MK785847",
"MK763014",
"MK786319",
"MK869555",
"MK869563",
"MN594841",
"MN594839",
"MT102518",
"MK785823",
"MK576890",
"MK785839",
"MK869203",
"MK912758",
"MK912766",
"MK927223",
"MK633617",
"MK633625",
"MK633633",
"MK633657",
"MK633681",
"MK633689",
"MK633697",
"MK898645",
"MK633705",
"MK633713",
"MK633721",
"MK633729",
"MK633737",
"MK898652",
"MK868723",
"MK913110",
"MK913126",
"MK913134",
"MK913142",
"MK913158",
"MK913166",
"MK913174",
"MK913182",
"MN169149",
"MK576898",
"MK633665",
"MK898639",
"MK898657",
"MK913102",
"MK913150",
"MK762978",
"MK913118",
"MN081410",
"MK742954",
"MN074010",
"MT102503",
"MT102505",
"MT102513",
"MN955500",
"MN955495",
"MN955501",
"MK763848",
"MK763856",
"MK763872",
"MK869539"
],
"strain":[
"A/Alborz/153084/2019",
"A/Iran/Clinical_Sample/2019",
"A/Turkey/8543/2019",
"A/Iran/Clinical_Sample/2019",
"A/Homo_sapien/China/LS320/2019",
"A/Homo_sapien/China/LS330/2019",
"A/India/Pun_19615/2019",
"A/Alborz/153427/2019",
"A/India/Pun_19533/2019",
"A/Japan/8262/2019",
"A/Homo_sapien/China/LS340/2019",
"A/Homo_sapien/China/LS341/2019",
"A/Homo_sapien/China/LS318/2019",
"A/Homo_sapien/China/LS319/2019",
"A/Homo_sapien/China/LS321/2019",
"A/Homo_sapien/China/LS322/2019",
"A/Homo_sapien/China/LS324/2019",
"A/Homo_sapien/China/LS326/2019",
"A/Homo_sapien/China/LS327/2019",
"A/Homo_sapien/China/LS328/2019",
"A/Homo_sapien/China/LS332/2019",
"A/Homo_sapien/China/LS334/2019",
"A/Homo_sapien/China/LS335/2019",
"A/Homo_sapien/China/LS336/2019",
"A/Homo_sapien/China/LS337/2019",
"A/Wuhan/11193/2019",
"A/Wuhan/1120/2019",
"A/Japan/8604/2019",
"A/Homo_sapien/China/LS339/2019",
"A/Japan/9505/2019",
"A/Homo_sapien/China/LS329/2019",
"A/Homo_sapien/China/LS331/2019",
"A/Japan/8261/2019",
"A/Japan/8264/2019",
"A/Japan/NHRC_OID_FDX70576/2019",
"A/South_Korea/8207/2019",
"A/Yokosuka/NHRC_OID_FDX70622/2019",
"A/Japan/NHRC_OID_FDX70557/2019",
"A/Japan/NHRC_OID_FDX70566/2019",
"A/Japan/7848/2019",
"A/Japan/NHRC_OID_FDX70571/2019",
"A/Wuhan/345/2019",
"A/Japan/8000/2019",
"A/Japan/8001/2019",
"A/Japan/8002/2019",
"A/Japan/8266/2019",
"A/South_Korea/8203/2019",
"A/South_Korea/8352/2019",
"A/South_Korea/8671/2019",
"A/South_Korea/8674/2019",
"A/Wuhan/5413/2019",
"A/Wuhan/877/2019",
"A/Homo_sapien/China/LS338/2019",
"A/Japan/8263/2019",
"A/Japan/7846/2019",
"A/Japan/8265/2019",
"A/Japan/8603/2019",
"A/Japan/8768/2019",
"A/Japan/8769/2019",
"A/Japan/8957/2019",
"A/Japan/NHRC_OID_FDX70561/2019",
"A/Japan/NHRC_OID_FDX70563/2019",
"A/Japan/NHRC_OID_FDX70564/2019",
"A/Japan/NHRC_OID_FDX70572/2019",
"A/Japan/NHRC_OID_FDX70577/2019",
"A/Japan/NHRC_OID_FDX70579/2019",
"A/Japan/NHRC_OID_FDX70583/2019",
"A/Japan/NHRC_OID_FDX70584/2019",
"A/Japan/NHRC_OID_FDX70586/2019",
"A/Japan/NHRC_OID_FDX70587/2019",
"A/Japan/NHRC_OID_FDX70589/2019",
"A/Japan/NHRC_OID_FDX70590/2019",
"A/Japan/NHRC_OID_FDX70591/2019",
"A/Japan/NHRC_OID_FDX70592/2019",
"A/South_Korea/8667/2019",
"A/South_Korea/8823/2019",
"A/South_Korea/8825/2019",
"A/South_Korea/8826/2019",
"A/South_Korea/8827/2019",
"A/South_Korea/8829/2019",
"A/South_Korea/8830/2019",
"A/South_Korea/8831/2019",
"A/South_Korea/8832/2019",
"A/South_Korea/9116/2019",
"A/Japan/7847/2019",
"A/Japan/NHRC_OID_FDX70574/2019",
"A/Japan/NHRC_OID_FDX70581/2019",
"A/Japan/NHRC_OID_FDX70593/2019",
"A/South_Korea/8822/2019",
"A/South_Korea/8828/2019",
"A/Japan/8142/2019",
"A/South_Korea/8824/2019",
"A/South_Korea/9704/2019",
"A/Japan/8003/2019",
"A/South_Korea/9578/2019",
"A/Homo_sapien/China/LS323/2019",
"A/Homo_sapien/China/LS325/2019",
"A/Homo_sapien/China/LS333/2019",
"A/India/Pun_1920970/2019",
"A/India/Pun_1922253/2019",
"A/India/Pun_1922052/2019",
"A/South_Korea/8204/2019",
"A/South_Korea/8206/2019",
"A/South_Korea/8208/2019",
"A/South_Korea/8668/2019"
]
}
},
{
"sequence":"MKTIIALSC",
"count":2,
"incidence":1.6129031,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":{
"accession":[
"MN169648",
"MN873990"
],
"country":[
"Japan",
"Japan"
],
"strain":[
"A/Japan/9070/2019",
"A/Yokosuka/NHRC_OID_FDX70722/2019"
],
"date":[
"03/12/2019",
"04/17/2019"
]
}
},
{
"sequence":"METISLISM",
"count":1,
"incidence":0.80645156,
"motif_short":"U",
"motif_long":"Unique",
"metadata":{
"accession":[
"MN853423"
],
"country":[
"China"
],
"strain":[
"A/Beijing/16/2019"
],
"date":[
"01/2019"
]
}
},
{
"sequence":"MKTIIALSH",
"count":3,
"incidence":2.4193547,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":{
"country":[
"South_Korea",
"South_Korea",
"South_Korea"
],
"date":[
"04/05/2019",
"04/10/2019",
"04/11/2019"
],
"strain":[
"A/South_Korea/9579/2019",
"A/South_Korea/9645/2019",
"A/South_Korea/9646/2019"
],
"accession":[
"MN074938",
"MN078683",
"MN078691"
]
}
}
]
}
]
}
Command-Line Arguments
Argument | Type | Required | Default | Example | Description |
---|---|---|---|---|---|
-h | N/A | False | N/A | dima-cli -h |
Prints a summary of all available command-line arguments. |
-n | String | False | N/A (raise error) | dima-cli -i sequences.afa -f "accession|strain|country" -n "Unknown" |
Silently fix missing values in the FASTA header with given value. |
-v | N/A | False | N/A | dima-cli -v |
Prints the version of dima-cli that is currently installed. |
-p | String | False | Unknown Protein | dima-cli -n "Coronavirus Surface Protein" -i sequences.afa |
The name of the protein that will appear on the results. |
-i | String | True | N/A | dima-cli -i sequences.afa |
The path to the FASTA Multiple Sequence Alignment file. |
-o | String | False | stdout (prints the results) | dima-cli -i sequences.afa -o results,json |
The location where the results shall be saved. |
-l | Integer | False | 9 | dima-cli -i sequences.afa -l 12 |
The length of the kmers generated. |
-f | String | False | N/A | dima-cli -i sequences.afa -f "accession|strain|country" |
The format of the FASTA header. Labels where each variant of a kmer position originated from. |
-s | Integer | False | 30 | dima-cli -i sequences.afa -l 12 -s 40 |
The minimum required support for each kmer position. |
Module Parameters
Parameter | Type | Required | Default | Description |
---|---|---|---|---|
sequences | String/StringIO | True | N/A | The path to a FASTA Multiple Sequence Alignment file (MSA), or a StringIO object containing FASTA MSA. |
kmer_length | Integer | False | 9 | The length of the kmers generated. |
header_fillna | String | False | None | Silently fix missing values in the FASTA header with given value (only required when header_format is given). |
json | Boolean | False | False | Whether the result is a JSON string, or a Python object. |
header_format | String | False | N/A | The format of the FASTA header. Labels where each variant of a kmer position originated from. |
support_threshold | Integer | False | 30 | The minimum required support for each kmer position. |
protein_name | String | False | Unknown Protein | The name of the protein that will appear on the results. |
json_save_path | String | False | stdout (prints to console) | The location where the results shall be saved (only required when json = True ). |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for dima_cli-2.0.4-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41c6574fd296324d139725997c478f129fd1a73fbd59ce305e05f709c1abd343 |
|
MD5 | 11675543cf78945e8700463865630541 |
|
BLAKE2b-256 | 7c52b8a0865b77d81d8aa037d53ff168e95479c70de0de59ed8e4f28f8b30a2d |
Hashes for dima_cli-2.0.4-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 905d0eaccb56d4c53b960ef18c78f878ccdd937c3cf46285932fe825be3c1ed4 |
|
MD5 | eee75f39149c260d31f619976a6cb68d |
|
BLAKE2b-256 | 57589b4822b4ad09bab6b6768cca750adfef081067ba3388cf28f31a981fffaa |
Hashes for dima_cli-2.0.4-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ebb532155f854aedd95e185db81582da236436cd0dc1210b5e62f418fa73c846 |
|
MD5 | c653875fedd8f284caedf5060e2bbb14 |
|
BLAKE2b-256 | f9283fe976885d8a7f22f0235bb86ea10730f2a1bdd971a7f0ed5dcbf605dee9 |
Hashes for dima_cli-2.0.4-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d5da97b57e79127b8c94ca40fd50b827742c6ce66cec4b3b622d8dec81330e3c |
|
MD5 | 1790dcbe387680fae3db6b7acdcc6081 |
|
BLAKE2b-256 | f6b93231d0213bcbfa61a0ca8261507fc8a9647b4a837c72d5647420f03bc15e |
Hashes for dima_cli-2.0.4-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa496080513f2df6f97f0c3f715451f063ad826a5034406625f2ecc56bc59dcf |
|
MD5 | 52172bde1365d71c0d224038a8f44b76 |
|
BLAKE2b-256 | f7e55421a93aa308fe43b1ae7a20b2a7fcebc61e5aa381ee329906b36dd068e2 |
Hashes for dima_cli-2.0.4-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc914f26e775e4b4061b0dee0dc3c1aea5055680d90f2927ce6d2cabab1bbfea |
|
MD5 | ccf97fdd7b2c2c8114b4dcc4f030d9e3 |
|
BLAKE2b-256 | f983a5b1adbbf39707ac559c7c21168620dad0d100347ac00f8d23dc36c8518f |
Hashes for dima_cli-2.0.4-cp37-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dfb87ef3f4adb2ccdf99b8ef0e9da70a962a3c681b5956ec7a910d06ec241471 |
|
MD5 | c33e5f9c9ee6895bd6f1dde083a7ff54 |
|
BLAKE2b-256 | fa2467ee823287f6f4eacd47b4412fc97a6b48d749d2589494baefe000c15190 |
Hashes for dima_cli-2.0.4-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2cf5d94fbd57272ff3430b0cfdba841322eddc1619ad96a833ac27cb8d0b875a |
|
MD5 | 11a6e378bd30ec111fea362dbc688bff |
|
BLAKE2b-256 | d218c5b450c6c2c299710dd73f6c5f1671bd4e387a16886bf71579fb1cccb1b6 |
Hashes for dima_cli-2.0.4-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 04d0a2bae56deca46cf19c1435f7ed0eeafc29c8941d091715107ce34c60c0de |
|
MD5 | 798a379cf1b1d0e48250ce10d5166d30 |
|
BLAKE2b-256 | bf8105e27fcaf0e23987ebbb90b3baf861051e346fee863cb17cc5fa755e3741 |
Hashes for dima_cli-2.0.4-cp36-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 22df25ad5d92ecdc7e6578b5f378ed0fdb589093757bb9afee1d48de115fd686 |
|
MD5 | b178c576ce71e62db1d6109c105620fb |
|
BLAKE2b-256 | 5f31f993fbe85178955902d21e1e852aa70c1b17accdaf29ab5e270b6e7ac08e |
Hashes for dima_cli-2.0.4-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d1c08a904863fb91c6b90542463ee790ba4aaa26c6c4bfee470b80d61390b2b |
|
MD5 | b0e31de6f00841423c4c18ee8fa8f2d3 |
|
BLAKE2b-256 | 972333dc1b748644ba759c08d4447c58f778d10881506cf2f3108f4443f807ed |
Hashes for dima_cli-2.0.4-cp36-cp36m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a42a2d4b14dc3d10838392b8ca768c681fcb1172972c8bbee991a2222aa4540b |
|
MD5 | bcfc6a0188de218994f773f3bf7a2c7a |
|
BLAKE2b-256 | c800e4e8067a443133fbbf7d7bb2f4e817cfc704572fb2c414527cb32eca1e7b |