Skip to main content

A command-line tool that analyses the diversity and motifs of protein sequences

Project description

DiMA - Diversity Motif Analyser

PyPI - Downloads GitHub closed issues GitHub issues PyPI - Python Version PyPI GitHub release (latest SemVer)

Table of Contents

What is DiMA?

Protein sequence diversity is one of the major challenges in the design of diagnostic, prophylactic and therapeutic interventions against viruses. DiMA is a tool designed to facilitate the dissection of protein sequence diversity dynamics for viruses. DiMA provides a quantitative measure of sequence diversity by use of Shannon’s entropy, applied via a user-defined k-mer sliding window. Further, the entropy value is corrected for sample size bias by applying a statistical adjustment. Additionally, DiMA further interrogates the diversity by dissecting the entropy value at each k-mer position to various diversity motifs. The distinct k-mer sequences at each position are classified into the following motifs based on their incidence.

  • Index: The predominant sequence.
  • Major: The sequence with the second highest incidence after the Index.
  • Minor: Kmers with incidence in between major and unique motifs
  • Unique: Kmers which are only seen once in a particular kmer position.

Moreover, the description line of the sequences in the alignment can be formatted for inclusion of meta-data that can be tagged to the diversity motifs. DiMA enables comparative diversity dynamics analysis, within and between proteins of a virus species, and proteomes of different viral species.

Installation

pip install dima-cli

Basic Usage

Shell Command

dima-cli -i aligned_sequences.afa -o results.json

Python

from dima import Dima
results = Dima(sequences="aligned_sequences.afa").run()

Results

{
   "sequence_count":203,
   "support_threshold":30,
   "low_support_count":15,
   "protein_name":"Unknown Protein",
   "kmer_length":9,
   "results":[
      {
         "position":1,
         "low_support": null,
         "entropy":0.8383740426713246,
         "support":124,
         "distinct_variants_count":4,
         "distinct_variants_incidence":3.2258062,
         "diversity_motifs":[
            {
               "sequence":"MKTIIALSC",
               "count":2,
               "incidence":1.6129031,
               "motif_short":"Mi",
               "motif_long":"Minor",
               "metadata":null
            },
            {
               "sequence":"MKTIIALSH",
               "count":3,
               "incidence":2.4193547,
               "motif_short":"Mi",
               "motif_long":"Minor",
               "metadata":null
            },
            {
               "sequence":"METISLISM",
               "count":1,
               "incidence":0.80645156,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"MKNIIALSY",
               "count":13,
               "incidence":10.4838705,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"MKTIIALSY",
               "count":105,
               "incidence":84.67742,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            }
         ]
      }
   ]
}

Advance Usage

Shell Command

dima-cli -i aligned_sequences.afa -o results.json -f "accession|strain|country|date"

Python

from dima import Dima
results = Dima(sequences="aligned_sequences.afa", header_format="accession|strain|country|date").run()

Results

{
   "sequence_count":346,
   "support_threshold":30,
   "low_support_count":0,
   "protein_name":"Unknown Protein",
   "kmer_length":9,
   "results":[
      {
         "position":1,
         "low_support":null,
         "entropy":0.4155993859186796,
         "support":324,
         "distinct_variants_count":10,
         "distinct_variants_incidence":3.0864198,
         "diversity_motifs":[
            {
               "sequence":"MERIEELRD",
               "count":3,
               "incidence":0.9259259,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"MERIRELRD",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"MERIQELRD",
               "count":2,
               "incidence":0.61728394,
               "motif_short":"Mi",
               "motif_long":"Minor",
               "metadata":null
            },
            {
               "sequence":"MERKKELRD",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"MERIKELRY",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"MERIKELKD",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"MERIKELGD",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"MESIKELRD",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"MERIKELRN",
               "count":2,
               "incidence":0.61728394,
               "motif_short":"Mi",
               "motif_long":"Minor",
               "metadata":null
            },
            {
               "sequence":"MERTKELRD",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"MERIKELRD",
               "count":310,
               "incidence":95.679016,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            }
         ]
      },
      {
         "position":2,
         "low_support":null,
         "entropy":0.39924295895426243,
         "support":324,
         "distinct_variants_count":10,
         "distinct_variants_incidence":3.0864198,
         "diversity_motifs":[
            {
               "sequence":"ERIKELGDL",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"ERIKELRYL",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"ESIKELRDL",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"ERTKELRDL",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"ERIKELRNL",
               "count":2,
               "incidence":0.61728394,
               "motif_short":"Mi",
               "motif_long":"Minor",
               "metadata":null
            },
            {
               "sequence":"ERIQELRDL",
               "count":2,
               "incidence":0.61728394,
               "motif_short":"Mi",
               "motif_long":"Minor",
               "metadata":null
            },
            {
               "sequence":"ERIKELKDL",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"ERIRELRDL",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"ERIKELRDL",
               "count":310,
               "incidence":95.679016,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            },
            {
               "sequence":"ERIEELRDL",
               "count":3,
               "incidence":0.9259259,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"ERKKELRDL",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            }
         ]
      },
      {
         "position":3,
         "low_support":null,
         "entropy":0.3786091248201827,
         "support":324,
         "distinct_variants_count":10,
         "distinct_variants_incidence":3.0864198,
         "diversity_motifs":[
            {
               "sequence":"RIKELRYLM",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"RIRELRDLM",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"RKKELRDLM",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"SIKELRDLM",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"RTKELRDLM",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"RIKELRDLM",
               "count":310,
               "incidence":95.679016,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            },
            {
               "sequence":"RIKELRNLM",
               "count":2,
               "incidence":0.61728394,
               "motif_short":"Mi",
               "motif_long":"Minor",
               "metadata":null
            },
            {
               "sequence":"RIEELRDLM",
               "count":3,
               "incidence":0.9259259,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"RIKELKDLM",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"RIQELRDLM",
               "count":2,
               "incidence":0.61728394,
               "motif_short":"Mi",
               "motif_long":"Minor",
               "metadata":null
            },
            {
               "sequence":"RIKELGDLM",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            }
         ]
      },
      {
         "position":4,
         "low_support":null,
         "entropy":0.321556750210603,
         "support":324,
         "distinct_variants_count":9,
         "distinct_variants_incidence":2.777778,
         "diversity_motifs":[
            {
               "sequence":"IQELRDLMS",
               "count":2,
               "incidence":0.61728394,
               "motif_short":"Mi",
               "motif_long":"Minor",
               "metadata":null
            },
            {
               "sequence":"IKELGDLMS",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"IKELRNLMS",
               "count":2,
               "incidence":0.61728394,
               "motif_short":"Mi",
               "motif_long":"Minor",
               "metadata":null
            },
            {
               "sequence":"IKELRDLMS",
               "count":311,
               "incidence":95.987656,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            },
            {
               "sequence":"IEELRDLMS",
               "count":3,
               "incidence":0.9259259,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"IKELRYLMS",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"KKELRDLMS",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"TKELRDLMS",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"IRELRDLMS",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"IKELKDLMS",
               "count":1,
               "incidence":0.30864197,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            }
         ]
      },
      {
         "position":5,
         "low_support":null,
         "entropy":0.42804434263685753,
         "support":331,
         "distinct_variants_count":10,
         "distinct_variants_incidence":3.021148,
         "diversity_motifs":[
            {
               "sequence":"KELRNLMSQ",
               "count":2,
               "incidence":0.6042296,
               "motif_short":"Mi",
               "motif_long":"Minor",
               "metadata":null
            },
            {
               "sequence":"RELRDLMSQ",
               "count":1,
               "incidence":0.3021148,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"KELKDLMSQ",
               "count":1,
               "incidence":0.3021148,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"KELRDLMSQ",
               "count":314,
               "incidence":94.864044,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            },
            {
               "sequence":"EELRDLMSQ",
               "count":3,
               "incidence":0.9063444,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"QELRDLMSQ",
               "count":2,
               "incidence":0.6042296,
               "motif_short":"Mi",
               "motif_long":"Minor",
               "metadata":null
            },
            {
               "sequence":"KELGDLMSQ",
               "count":1,
               "incidence":0.3021148,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"KELRDLMSL",
               "count":1,
               "incidence":0.3021148,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"KELRYLMSQ",
               "count":1,
               "incidence":0.3021148,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"KGLRDLMSQ",
               "count":2,
               "incidence":0.6042296,
               "motif_short":"Mi",
               "motif_long":"Minor",
               "metadata":null
            },
            {
               "sequence":"KDLRDLMSQ",
               "count":3,
               "incidence":0.9063444,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            }
         ]
      },
      {
         "position":6,
         "low_support":null,
         "entropy":0.3179789989386376,
         "support":331,
         "distinct_variants_count":7,
         "distinct_variants_incidence":2.1148038,
         "diversity_motifs":[
            {
               "sequence":"DLRDLMSQS",
               "count":3,
               "incidence":0.9063444,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"ELRYLMSQS",
               "count":1,
               "incidence":0.3021148,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"GLRDLMSQS",
               "count":2,
               "incidence":0.6042296,
               "motif_short":"Mi",
               "motif_long":"Minor",
               "metadata":null
            },
            {
               "sequence":"ELRNLMSQS",
               "count":2,
               "incidence":0.6042296,
               "motif_short":"Mi",
               "motif_long":"Minor",
               "metadata":null
            },
            {
               "sequence":"ELKDLMSQS",
               "count":1,
               "incidence":0.3021148,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"ELRDLMSLS",
               "count":1,
               "incidence":0.3021148,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"ELGDLMSQS",
               "count":1,
               "incidence":0.3021148,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"ELRDLMSQS",
               "count":320,
               "incidence":96.676735,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            }
         ]
      },
      {
         "position":7,
         "low_support":null,
         "entropy":0.29910562747260794,
         "support":339,
         "distinct_variants_count":8,
         "distinct_variants_incidence":2.3598819,
         "diversity_motifs":[
            {
               "sequence":"LRDLMSQSP",
               "count":2,
               "incidence":0.58997047,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"LRYLMSQSR",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"LRVLMSQSR",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"LRVLLSQSR",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"LRDLMSQSR",
               "count":329,
               "incidence":97.05015,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            },
            {
               "sequence":"LKDLMSQSR",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"LGDLMSQSR",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"LRNLMSQSR",
               "count":2,
               "incidence":0.58997047,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"LRDLMSLSR",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            }
         ]
      },
      {
         "position":8,
         "low_support":null,
         "entropy":0.2430201596843649,
         "support":339,
         "distinct_variants_count":8,
         "distinct_variants_incidence":2.3598819,
         "diversity_motifs":[
            {
               "sequence":"RVLLSQSRT",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"RNLMSQSRT",
               "count":2,
               "incidence":0.58997047,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"RVLMSQSRT",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"RDLMSQSRT",
               "count":329,
               "incidence":97.05015,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            },
            {
               "sequence":"GDLMSQSRT",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"RDLMSLSRT",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"RYLMSQSRT",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"RDLMSQSPT",
               "count":2,
               "incidence":0.58997047,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"KDLMSQSRT",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            }
         ]
      },
      {
         "position":9,
         "low_support":null,
         "entropy":0.24307001384944973,
         "support":340,
         "distinct_variants_count":6,
         "distinct_variants_incidence":1.764706,
         "diversity_motifs":[
            {
               "sequence":"VLLSQSRTR",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"YLMSQSRTR",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"DLMSQSRTR",
               "count":332,
               "incidence":97.64706,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            },
            {
               "sequence":"DLMSQSPTR",
               "count":2,
               "incidence":0.5882353,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"NLMSQSRTR",
               "count":2,
               "incidence":0.5882353,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"VLMSQSRTR",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"DLMSLSRTR",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            }
         ]
      },
      {
         "position":10,
         "low_support":null,
         "entropy":0.12749531957967702,
         "support":340,
         "distinct_variants_count":3,
         "distinct_variants_incidence":0.882353,
         "diversity_motifs":[
            {
               "sequence":"LMSQSRTRE",
               "count":336,
               "incidence":98.82353,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            },
            {
               "sequence":"LLSQSRTRE",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"LMSQSPTRE",
               "count":2,
               "incidence":0.5882353,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"LMSLSRTRE",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            }
         ]
      },
      {
         "position":11,
         "low_support":null,
         "entropy":0.13449723636627878,
         "support":341,
         "distinct_variants_count":5,
         "distinct_variants_incidence":1.4662757,
         "diversity_motifs":[
            {
               "sequence":"MSQFRTREI",
               "count":1,
               "incidence":0.29325512,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"LSQSRTREI",
               "count":1,
               "incidence":0.29325512,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"MSQSRTREI",
               "count":335,
               "incidence":98.24047,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            },
            {
               "sequence":"MSQSPTREI",
               "count":2,
               "incidence":0.58651024,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"MSLSRTREI",
               "count":1,
               "incidence":0.29325512,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"MSQSRTREM",
               "count":1,
               "incidence":0.29325512,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            }
         ]
      },
      {
         "position":12,
         "low_support":null,
         "entropy":0.11944211390961701,
         "support":341,
         "distinct_variants_count":4,
         "distinct_variants_incidence":1.1730205,
         "diversity_motifs":[
            {
               "sequence":"SQFRTREIL",
               "count":1,
               "incidence":0.29325512,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"SQSRTREML",
               "count":1,
               "incidence":0.29325512,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"SQSPTREIL",
               "count":2,
               "incidence":0.58651024,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"SLSRTREIL",
               "count":1,
               "incidence":0.29325512,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"SQSRTREIL",
               "count":336,
               "incidence":98.53372,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            }
         ]
      },
      {
         "position":13,
         "low_support":null,
         "entropy":0.1790721384219538,
         "support":341,
         "distinct_variants_count":6,
         "distinct_variants_incidence":1.7595308,
         "diversity_motifs":[
            {
               "sequence":"QSRTREILT",
               "count":334,
               "incidence":97.94722,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            },
            {
               "sequence":"QFRTREILK",
               "count":1,
               "incidence":0.29325512,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"QSRTREILK",
               "count":1,
               "incidence":0.29325512,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"QSRTREMLT",
               "count":1,
               "incidence":0.29325512,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"QSRTREILA",
               "count":1,
               "incidence":0.29325512,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"QSPTREILT",
               "count":2,
               "incidence":0.58651024,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"LSRTREILT",
               "count":1,
               "incidence":0.29325512,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            }
         ]
      },
      {
         "position":14,
         "low_support":null,
         "entropy":0.22601277726095098,
         "support":339,
         "distinct_variants_count":6,
         "distinct_variants_incidence":1.7699115,
         "diversity_motifs":[
            {
               "sequence":"FRTREILKK",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"SPTREILTK",
               "count":2,
               "incidence":0.58997047,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"SRTREMLTK",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"SRTREILTK",
               "count":331,
               "incidence":97.640114,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            },
            {
               "sequence":"SRTREILTR",
               "count":2,
               "incidence":0.58997047,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"SRTREILKK",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"SRTREILAK",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            }
         ]
      },
      {
         "position":15,
         "low_support":null,
         "entropy":0.18314376315390937,
         "support":339,
         "distinct_variants_count":5,
         "distinct_variants_incidence":1.4749262,
         "diversity_motifs":[
            {
               "sequence":"RTREILTRT",
               "count":2,
               "incidence":0.58997047,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"RTREMLTKT",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"RTREILTKT",
               "count":331,
               "incidence":97.640114,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            },
            {
               "sequence":"RTREILKKT",
               "count":2,
               "incidence":0.58997047,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"RTREILAKT",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"PTREILTKT",
               "count":2,
               "incidence":0.58997047,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            }
         ]
      },
      {
         "position":16,
         "low_support":null,
         "entropy":0.14519461444101875,
         "support":339,
         "distinct_variants_count":4,
         "distinct_variants_incidence":1.1799409,
         "diversity_motifs":[
            {
               "sequence":"TREMLTKTT",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"TREILTRTT",
               "count":2,
               "incidence":0.58997047,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"TREILKKTT",
               "count":2,
               "incidence":0.58997047,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"TREILTKTT",
               "count":333,
               "incidence":98.23009,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            },
            {
               "sequence":"TREILAKTT",
               "count":1,
               "incidence":0.29498523,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            }
         ]
      },
      {
         "position":17,
         "low_support":null,
         "entropy":0.12872153445634518,
         "support":340,
         "distinct_variants_count":4,
         "distinct_variants_incidence":1.1764706,
         "diversity_motifs":[
            {
               "sequence":"REILKKTTV",
               "count":2,
               "incidence":0.5882353,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"REILTKTTV",
               "count":334,
               "incidence":98.23529,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            },
            {
               "sequence":"REMLTKTTV",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"REILAKTTV",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"REILTRTTV",
               "count":2,
               "incidence":0.5882353,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            }
         ]
      },
      {
         "position":18,
         "low_support":null,
         "entropy":0.14544968829880628,
         "support":340,
         "distinct_variants_count":5,
         "distinct_variants_incidence":1.4705882,
         "diversity_motifs":[
            {
               "sequence":"EILTKTTVD",
               "count":334,
               "incidence":98.23529,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            },
            {
               "sequence":"EILTRTTVD",
               "count":2,
               "incidence":0.5882353,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"EILKKTTVA",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"EILAKTTVD",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"EILKKTTVD",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"EMLTKTTVD",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            }
         ]
      },
      {
         "position":19,
         "low_support":null,
         "entropy":0.16340842363172586,
         "support":340,
         "distinct_variants_count":5,
         "distinct_variants_incidence":1.4705882,
         "diversity_motifs":[
            {
               "sequence":"ILKKTTVDH",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"ILTKTTVDH",
               "count":334,
               "incidence":98.23529,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            },
            {
               "sequence":"ILKKTTVAH",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"MLTKTTVDH",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"ILTRTTVDH",
               "count":2,
               "incidence":0.5882353,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"ILAKTTVDH",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            }
         ]
      },
      {
         "position":20,
         "low_support":null,
         "entropy":0.12946172258125144,
         "support":340,
         "distinct_variants_count":4,
         "distinct_variants_incidence":1.1764706,
         "diversity_motifs":[
            {
               "sequence":"LKKTTVAHM",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"LTKTTVDHM",
               "count":335,
               "incidence":98.52941,
               "motif_short":"I",
               "motif_long":"Index",
               "metadata":null
            },
            {
               "sequence":"LAKTTVDHM",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            },
            {
               "sequence":"LTRTTVDHM",
               "count":2,
               "incidence":0.5882353,
               "motif_short":"Ma",
               "motif_long":"Major",
               "metadata":null
            },
            {
               "sequence":"LKKTTVDHM",
               "count":1,
               "incidence":0.29411766,
               "motif_short":"U",
               "motif_long":"Unique",
               "metadata":null
            }
         ]
      }
   ]
}

Command-Line Arguments

Argument Type Required Default Example Description
-h N/A False N/A dima-cli -h Prints a summary of all available command-line arguments.
-n String False Unknown dima-cli -i sequences.afa -o results.json -f "accession|strain|country" -n "NA" Silently fix missing values in the FASTA header with given value.
-v N/A False N/A dima-cli -v Prints the version of dima-cli that is currently installed.
-p String False Unknown Protein dima-cli -n "Coronavirus Surface Protein" -i sequences.afa -o results.json The name of the protein that will appear on the results.
-i String True N/A dima-cli -i sequences.afa -o results.json The path to the FASTA Multiple Sequence Alignment file.
-o String True N/A dima-cli -i sequences.afa -o results,json The location where the results shall be saved.
-l Integer False 9 dima-cli -i sequences.afa -l 12 -o results.json The length of the kmers generated.
-f String False N/A dima-cli -i sequences.afa -f "accession|strain|country" -o results.json The format of the FASTA header. Labels where each variant of a kmer position originated from.
-s Integer False 30 dima-cli -i sequences.afa -l 12 -s 40 -o results.json The minimum required support for each kmer position.
-a nucleotide/protein False protein dima-cli -i dna_sequences.afa -a nucleotide -o results.json The alphabet of the sequences (ie: protein/nucleotide, default: protein)
-t json/xlsx False json dima-cli -i dna_sequences.afa -a nucleotide -o results.json -t xlsx The output format (ie: json/xlsx, default: json)

Module Parameters

Parameter Type Required Default Description
sequences String/StringIO True N/A The path to a FASTA Multiple Sequence Alignment file (MSA), or a StringIO object containing FASTA MSA.
kmer_length Integer False 9 The length of the kmers generated.
header_fillna String False Unknown Silently fix missing values in the FASTA header with given value (only required when header_format is given).
header_format String False N/A The format of the FASTA header. Labels where each variant of a kmer position originated from.
support_threshold Integer False 30 The minimum required support for each kmer position.
protein_name String False Unknown Protein The name of the protein that will appear on the results.
alphabet String False protein The alphabet of the sequences (ie: protein/nucleotide, default: protein)

Project details


Release history Release notifications | RSS feed

This version

3.0.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

dima_cli-3.0.0-cp310-none-win_amd64.whl (412.1 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

dima_cli-3.0.0-cp310-cp310-manylinux_2_24_x86_64.whl (482.8 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.24+ x86-64

dima_cli-3.0.0-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (857.1 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

dima_cli-3.0.0-cp39-none-win_amd64.whl (412.7 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

dima_cli-3.0.0-cp39-cp39-manylinux_2_24_x86_64.whl (482.9 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.24+ x86-64

dima_cli-3.0.0-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (857.6 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

dima_cli-3.0.0-cp38-none-win_amd64.whl (411.8 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

dima_cli-3.0.0-cp38-cp38-manylinux_2_24_x86_64.whl (483.4 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.24+ x86-64

dima_cli-3.0.0-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (857.1 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

dima_cli-3.0.0-cp37-none-win_amd64.whl (412.1 kB view hashes)

Uploaded CPython 3.7 Windows x86-64

dima_cli-3.0.0-cp37-cp37m-manylinux_2_24_x86_64.whl (483.4 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.24+ x86-64

dima_cli-3.0.0-cp37-cp37m-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (856.8 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page