DiMA - Diversity Motif Analyser
Table of Contents
What is DiMA?
Protein sequence diversity is one of the major challenges in the design of diagnostic, prophylactic and therapeutic
interventions against viruses. DiMA is a tool designed to facilitate the dissection of protein sequence diversity
dynamics for viruses. DiMA provides a quantitative measure of sequence diversity by use of Shannon’s entropy,
applied via a user-defined k-mer sliding window. Further, the entropy value is corrected for sample size bias by
applying a statistical adjustment.
Additionally, DiMA further interrogates the diversity by dissecting the entropy value at each k-mer position to various
diversity motifs. The distinct k-mer sequences at each position are classified into the following motifs based on
their incidence.
- Index: The predominant sequence.
- Major: The sequence with the second highest incidence after the Index.
- Minor: Kmers with incidence in between major and unique motifs
- Unique: Kmers which are only seen once in a particular kmer position.
Moreover, the description line of the sequences in the alignment can be
formatted for inclusion of meta-data that can be tagged to the diversity motifs. DiMA enables comparative diversity
dynamics analysis, within and between proteins of a virus species, and proteomes of different viral species.
Installation
pip install dima-cli
Basic Usage
Shell Command
dima-cli -i aligned_sequences.afa -o results.json
Python
from dima import Dima
results = Dima(sequences="aligned_sequences.afa").run()
Results
{
"sequence_count":203,
"support_threshold":30,
"low_support_count":15,
"protein_name":"Unknown Protein",
"kmer_length":9,
"results":[
{
"position":1,
"low_support": null,
"entropy":0.8383740426713246,
"support":124,
"distinct_variants_count":4,
"distinct_variants_incidence":3.2258062,
"diversity_motifs":[
{
"sequence":"MKTIIALSC",
"count":2,
"incidence":1.6129031,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":null
},
{
"sequence":"MKTIIALSH",
"count":3,
"incidence":2.4193547,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":null
},
{
"sequence":"METISLISM",
"count":1,
"incidence":0.80645156,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"MKNIIALSY",
"count":13,
"incidence":10.4838705,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"MKTIIALSY",
"count":105,
"incidence":84.67742,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
}
]
}
]
}
Advance Usage
Shell Command
dima-cli -i aligned_sequences.afa -o results.json -f "accession|strain|country|date"
Python
from dima import Dima
results = Dima(sequences="aligned_sequences.afa", header_format="accession|strain|country|date").run()
Results
{
"sequence_count":346,
"support_threshold":30,
"low_support_count":0,
"protein_name":"Unknown Protein",
"kmer_length":9,
"results":[
{
"position":1,
"low_support":null,
"entropy":0.4155993859186796,
"support":324,
"distinct_variants_count":10,
"distinct_variants_incidence":3.0864198,
"diversity_motifs":[
{
"sequence":"MERIEELRD",
"count":3,
"incidence":0.9259259,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"MERIRELRD",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"MERIQELRD",
"count":2,
"incidence":0.61728394,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":null
},
{
"sequence":"MERKKELRD",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"MERIKELRY",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"MERIKELKD",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"MERIKELGD",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"MESIKELRD",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"MERIKELRN",
"count":2,
"incidence":0.61728394,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":null
},
{
"sequence":"MERTKELRD",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"MERIKELRD",
"count":310,
"incidence":95.679016,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
}
]
},
{
"position":2,
"low_support":null,
"entropy":0.39924295895426243,
"support":324,
"distinct_variants_count":10,
"distinct_variants_incidence":3.0864198,
"diversity_motifs":[
{
"sequence":"ERIKELGDL",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"ERIKELRYL",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"ESIKELRDL",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"ERTKELRDL",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"ERIKELRNL",
"count":2,
"incidence":0.61728394,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":null
},
{
"sequence":"ERIQELRDL",
"count":2,
"incidence":0.61728394,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":null
},
{
"sequence":"ERIKELKDL",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"ERIRELRDL",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"ERIKELRDL",
"count":310,
"incidence":95.679016,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
},
{
"sequence":"ERIEELRDL",
"count":3,
"incidence":0.9259259,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"ERKKELRDL",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
}
]
},
{
"position":3,
"low_support":null,
"entropy":0.3786091248201827,
"support":324,
"distinct_variants_count":10,
"distinct_variants_incidence":3.0864198,
"diversity_motifs":[
{
"sequence":"RIKELRYLM",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"RIRELRDLM",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"RKKELRDLM",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"SIKELRDLM",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"RTKELRDLM",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"RIKELRDLM",
"count":310,
"incidence":95.679016,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
},
{
"sequence":"RIKELRNLM",
"count":2,
"incidence":0.61728394,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":null
},
{
"sequence":"RIEELRDLM",
"count":3,
"incidence":0.9259259,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"RIKELKDLM",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"RIQELRDLM",
"count":2,
"incidence":0.61728394,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":null
},
{
"sequence":"RIKELGDLM",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
}
]
},
{
"position":4,
"low_support":null,
"entropy":0.321556750210603,
"support":324,
"distinct_variants_count":9,
"distinct_variants_incidence":2.777778,
"diversity_motifs":[
{
"sequence":"IQELRDLMS",
"count":2,
"incidence":0.61728394,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":null
},
{
"sequence":"IKELGDLMS",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"IKELRNLMS",
"count":2,
"incidence":0.61728394,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":null
},
{
"sequence":"IKELRDLMS",
"count":311,
"incidence":95.987656,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
},
{
"sequence":"IEELRDLMS",
"count":3,
"incidence":0.9259259,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"IKELRYLMS",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"KKELRDLMS",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"TKELRDLMS",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"IRELRDLMS",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"IKELKDLMS",
"count":1,
"incidence":0.30864197,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
}
]
},
{
"position":5,
"low_support":null,
"entropy":0.42804434263685753,
"support":331,
"distinct_variants_count":10,
"distinct_variants_incidence":3.021148,
"diversity_motifs":[
{
"sequence":"KELRNLMSQ",
"count":2,
"incidence":0.6042296,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":null
},
{
"sequence":"RELRDLMSQ",
"count":1,
"incidence":0.3021148,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"KELKDLMSQ",
"count":1,
"incidence":0.3021148,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"KELRDLMSQ",
"count":314,
"incidence":94.864044,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
},
{
"sequence":"EELRDLMSQ",
"count":3,
"incidence":0.9063444,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"QELRDLMSQ",
"count":2,
"incidence":0.6042296,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":null
},
{
"sequence":"KELGDLMSQ",
"count":1,
"incidence":0.3021148,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"KELRDLMSL",
"count":1,
"incidence":0.3021148,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"KELRYLMSQ",
"count":1,
"incidence":0.3021148,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"KGLRDLMSQ",
"count":2,
"incidence":0.6042296,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":null
},
{
"sequence":"KDLRDLMSQ",
"count":3,
"incidence":0.9063444,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
}
]
},
{
"position":6,
"low_support":null,
"entropy":0.3179789989386376,
"support":331,
"distinct_variants_count":7,
"distinct_variants_incidence":2.1148038,
"diversity_motifs":[
{
"sequence":"DLRDLMSQS",
"count":3,
"incidence":0.9063444,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"ELRYLMSQS",
"count":1,
"incidence":0.3021148,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"GLRDLMSQS",
"count":2,
"incidence":0.6042296,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":null
},
{
"sequence":"ELRNLMSQS",
"count":2,
"incidence":0.6042296,
"motif_short":"Mi",
"motif_long":"Minor",
"metadata":null
},
{
"sequence":"ELKDLMSQS",
"count":1,
"incidence":0.3021148,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"ELRDLMSLS",
"count":1,
"incidence":0.3021148,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"ELGDLMSQS",
"count":1,
"incidence":0.3021148,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"ELRDLMSQS",
"count":320,
"incidence":96.676735,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
}
]
},
{
"position":7,
"low_support":null,
"entropy":0.29910562747260794,
"support":339,
"distinct_variants_count":8,
"distinct_variants_incidence":2.3598819,
"diversity_motifs":[
{
"sequence":"LRDLMSQSP",
"count":2,
"incidence":0.58997047,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"LRYLMSQSR",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"LRVLMSQSR",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"LRVLLSQSR",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"LRDLMSQSR",
"count":329,
"incidence":97.05015,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
},
{
"sequence":"LKDLMSQSR",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"LGDLMSQSR",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"LRNLMSQSR",
"count":2,
"incidence":0.58997047,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"LRDLMSLSR",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
}
]
},
{
"position":8,
"low_support":null,
"entropy":0.2430201596843649,
"support":339,
"distinct_variants_count":8,
"distinct_variants_incidence":2.3598819,
"diversity_motifs":[
{
"sequence":"RVLLSQSRT",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"RNLMSQSRT",
"count":2,
"incidence":0.58997047,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"RVLMSQSRT",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"RDLMSQSRT",
"count":329,
"incidence":97.05015,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
},
{
"sequence":"GDLMSQSRT",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"RDLMSLSRT",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"RYLMSQSRT",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"RDLMSQSPT",
"count":2,
"incidence":0.58997047,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"KDLMSQSRT",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
}
]
},
{
"position":9,
"low_support":null,
"entropy":0.24307001384944973,
"support":340,
"distinct_variants_count":6,
"distinct_variants_incidence":1.764706,
"diversity_motifs":[
{
"sequence":"VLLSQSRTR",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"YLMSQSRTR",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"DLMSQSRTR",
"count":332,
"incidence":97.64706,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
},
{
"sequence":"DLMSQSPTR",
"count":2,
"incidence":0.5882353,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"NLMSQSRTR",
"count":2,
"incidence":0.5882353,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"VLMSQSRTR",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"DLMSLSRTR",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
}
]
},
{
"position":10,
"low_support":null,
"entropy":0.12749531957967702,
"support":340,
"distinct_variants_count":3,
"distinct_variants_incidence":0.882353,
"diversity_motifs":[
{
"sequence":"LMSQSRTRE",
"count":336,
"incidence":98.82353,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
},
{
"sequence":"LLSQSRTRE",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"LMSQSPTRE",
"count":2,
"incidence":0.5882353,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"LMSLSRTRE",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
}
]
},
{
"position":11,
"low_support":null,
"entropy":0.13449723636627878,
"support":341,
"distinct_variants_count":5,
"distinct_variants_incidence":1.4662757,
"diversity_motifs":[
{
"sequence":"MSQFRTREI",
"count":1,
"incidence":0.29325512,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"LSQSRTREI",
"count":1,
"incidence":0.29325512,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"MSQSRTREI",
"count":335,
"incidence":98.24047,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
},
{
"sequence":"MSQSPTREI",
"count":2,
"incidence":0.58651024,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"MSLSRTREI",
"count":1,
"incidence":0.29325512,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"MSQSRTREM",
"count":1,
"incidence":0.29325512,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
}
]
},
{
"position":12,
"low_support":null,
"entropy":0.11944211390961701,
"support":341,
"distinct_variants_count":4,
"distinct_variants_incidence":1.1730205,
"diversity_motifs":[
{
"sequence":"SQFRTREIL",
"count":1,
"incidence":0.29325512,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"SQSRTREML",
"count":1,
"incidence":0.29325512,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"SQSPTREIL",
"count":2,
"incidence":0.58651024,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"SLSRTREIL",
"count":1,
"incidence":0.29325512,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"SQSRTREIL",
"count":336,
"incidence":98.53372,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
}
]
},
{
"position":13,
"low_support":null,
"entropy":0.1790721384219538,
"support":341,
"distinct_variants_count":6,
"distinct_variants_incidence":1.7595308,
"diversity_motifs":[
{
"sequence":"QSRTREILT",
"count":334,
"incidence":97.94722,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
},
{
"sequence":"QFRTREILK",
"count":1,
"incidence":0.29325512,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"QSRTREILK",
"count":1,
"incidence":0.29325512,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"QSRTREMLT",
"count":1,
"incidence":0.29325512,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"QSRTREILA",
"count":1,
"incidence":0.29325512,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"QSPTREILT",
"count":2,
"incidence":0.58651024,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"LSRTREILT",
"count":1,
"incidence":0.29325512,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
}
]
},
{
"position":14,
"low_support":null,
"entropy":0.22601277726095098,
"support":339,
"distinct_variants_count":6,
"distinct_variants_incidence":1.7699115,
"diversity_motifs":[
{
"sequence":"FRTREILKK",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"SPTREILTK",
"count":2,
"incidence":0.58997047,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"SRTREMLTK",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"SRTREILTK",
"count":331,
"incidence":97.640114,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
},
{
"sequence":"SRTREILTR",
"count":2,
"incidence":0.58997047,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"SRTREILKK",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"SRTREILAK",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
}
]
},
{
"position":15,
"low_support":null,
"entropy":0.18314376315390937,
"support":339,
"distinct_variants_count":5,
"distinct_variants_incidence":1.4749262,
"diversity_motifs":[
{
"sequence":"RTREILTRT",
"count":2,
"incidence":0.58997047,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"RTREMLTKT",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"RTREILTKT",
"count":331,
"incidence":97.640114,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
},
{
"sequence":"RTREILKKT",
"count":2,
"incidence":0.58997047,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"RTREILAKT",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"PTREILTKT",
"count":2,
"incidence":0.58997047,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
}
]
},
{
"position":16,
"low_support":null,
"entropy":0.14519461444101875,
"support":339,
"distinct_variants_count":4,
"distinct_variants_incidence":1.1799409,
"diversity_motifs":[
{
"sequence":"TREMLTKTT",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"TREILTRTT",
"count":2,
"incidence":0.58997047,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"TREILKKTT",
"count":2,
"incidence":0.58997047,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"TREILTKTT",
"count":333,
"incidence":98.23009,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
},
{
"sequence":"TREILAKTT",
"count":1,
"incidence":0.29498523,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
}
]
},
{
"position":17,
"low_support":null,
"entropy":0.12872153445634518,
"support":340,
"distinct_variants_count":4,
"distinct_variants_incidence":1.1764706,
"diversity_motifs":[
{
"sequence":"REILKKTTV",
"count":2,
"incidence":0.5882353,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"REILTKTTV",
"count":334,
"incidence":98.23529,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
},
{
"sequence":"REMLTKTTV",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"REILAKTTV",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"REILTRTTV",
"count":2,
"incidence":0.5882353,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
}
]
},
{
"position":18,
"low_support":null,
"entropy":0.14544968829880628,
"support":340,
"distinct_variants_count":5,
"distinct_variants_incidence":1.4705882,
"diversity_motifs":[
{
"sequence":"EILTKTTVD",
"count":334,
"incidence":98.23529,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
},
{
"sequence":"EILTRTTVD",
"count":2,
"incidence":0.5882353,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"EILKKTTVA",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"EILAKTTVD",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"EILKKTTVD",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"EMLTKTTVD",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
}
]
},
{
"position":19,
"low_support":null,
"entropy":0.16340842363172586,
"support":340,
"distinct_variants_count":5,
"distinct_variants_incidence":1.4705882,
"diversity_motifs":[
{
"sequence":"ILKKTTVDH",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"ILTKTTVDH",
"count":334,
"incidence":98.23529,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
},
{
"sequence":"ILKKTTVAH",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"MLTKTTVDH",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"ILTRTTVDH",
"count":2,
"incidence":0.5882353,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"ILAKTTVDH",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
}
]
},
{
"position":20,
"low_support":null,
"entropy":0.12946172258125144,
"support":340,
"distinct_variants_count":4,
"distinct_variants_incidence":1.1764706,
"diversity_motifs":[
{
"sequence":"LKKTTVAHM",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"LTKTTVDHM",
"count":335,
"incidence":98.52941,
"motif_short":"I",
"motif_long":"Index",
"metadata":null
},
{
"sequence":"LAKTTVDHM",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
},
{
"sequence":"LTRTTVDHM",
"count":2,
"incidence":0.5882353,
"motif_short":"Ma",
"motif_long":"Major",
"metadata":null
},
{
"sequence":"LKKTTVDHM",
"count":1,
"incidence":0.29411766,
"motif_short":"U",
"motif_long":"Unique",
"metadata":null
}
]
}
]
}
Command-Line Arguments
Argument |
Type |
Required |
Default |
Example |
Description |
-h |
N/A |
False |
N/A |
dima-cli -h |
Prints a summary of all available command-line arguments. |
-n |
String |
False |
Unknown |
dima-cli -i sequences.afa -o results.json -f "accession|strain|country" -n "NA" |
Silently fix missing values in the FASTA header with given value. |
-v |
N/A |
False |
N/A |
dima-cli -v |
Prints the version of dima-cli that is currently installed. |
-p |
String |
False |
Unknown Protein |
dima-cli -n "Coronavirus Surface Protein" -i sequences.afa -o results.json |
The name of the protein that will appear on the results. |
-i |
String |
True |
N/A |
dima-cli -i sequences.afa -o results.json |
The path to the FASTA Multiple Sequence Alignment file. |
-o |
String |
True |
N/A |
dima-cli -i sequences.afa -o results,json |
The location where the results shall be saved. |
-l |
Integer |
False |
9 |
dima-cli -i sequences.afa -l 12 -o results.json |
The length of the kmers generated. |
-f |
String |
False |
N/A |
dima-cli -i sequences.afa -f "accession|strain|country" -o results.json |
The format of the FASTA header. Labels where each variant of a kmer position originated from. |
-s |
Integer |
False |
30 |
dima-cli -i sequences.afa -l 12 -s 40 -o results.json |
The minimum required support for each kmer position. |
-a |
nucleotide/protein |
False |
protein |
dima-cli -i dna_sequences.afa -a nucleotide -o results.json |
The alphabet of the sequences (ie: protein /nucleotide , default: protein) |
-t |
json/xlsx |
False |
json |
dima-cli -i dna_sequences.afa -a nucleotide -o results.json -t xlsx |
The output format (ie: json /xlsx , default: json) |
Module Parameters
Parameter |
Type |
Required |
Default |
Description |
sequences |
String/StringIO |
True |
N/A |
The path to a FASTA Multiple Sequence Alignment file (MSA), or a StringIO object containing FASTA MSA. |
kmer_length |
Integer |
False |
9 |
The length of the kmers generated. |
header_fillna |
String |
False |
Unknown |
Silently fix missing values in the FASTA header with given value (only required when header_format is given). |
header_format |
String |
False |
N/A |
The format of the FASTA header. Labels where each variant of a kmer position originated from. |
support_threshold |
Integer |
False |
30 |
The minimum required support for each kmer position. |
protein_name |
String |
False |
Unknown Protein |
The name of the protein that will appear on the results. |
alphabet |
String |
False |
protein |
The alphabet of the sequences (ie: protein/nucleotide, default: protein) |