Skip to main content

Piecing together complete genetic coverage for biomonitoring.

Project description

mozaiko: Piecing Together Complete Genetic Coverage for Biomonitoring

License: GPL v3 Lint Status Packge Tests codecov

alt text

mozaiko is a bioinformatics tool designed to help researchers select optimized sets of primers for complete coverage in biomonitoring studies. Taking inspiration from mosaic art, where small pieces fit together to form a whole, mozaiko supports comprehensive genetic marker analysis by suggesting a fitting combination of primers.

The name comes from the Esperanto word 'Mozaiko', reflecting the idea of bringing different elements together. With mozaiko, researchers can efficiently select primer sets for a range of applications, making biomonitoring and ecological studies more reliable and comparable.

Installation instructions

Prerequisites

  • Python 3.x
  • Conda (Miniconda or Anaconda)
  • Git

Installation

  1. Clone the repository:

    git clone git@github.com:CIBIO-BU/mozaiko.git
    
    cd mozaiko
    
  2. Run the installation script:

    chmod +x conda_env_setup.sh
    
    ./conda_env_setup.sh
    
  3. Activate the environment:

    conda activate mozaiko
    
  4. Install mozaiko:

    pip install -e .
    

Th installation script will:

  • Check if Conda is installed;
  • Create a new Conda environment named "mozaiko", if it does not yet exist;
  • Activate the Conda environment;
  • Clone the mozaiko repository, if not already cloned;
  • Install the mozaiko package;
  • Install required dependencies and tools.

mozaiko Metrics' System

mozaiko contains three main categories to evaluate and rank primer sets:

Module 1: Reference Database Quality

  • barcoded_taxa_one_plus: percentage of taxa in OTL with more than one barcode. A barcode must include the target insert to be considered.
  • ratio_barcoded_taxa: proportion of taxa in OTL with high barcode coverage (more than five barcodes) relative to taxa with minimal barcode coverage (at least one barcode). The value ranges from 0 to 1, 1 representing the optimal scenario.

Module 2: Binding

  • mismatch_score: the maximum number of mismatches between the forward primer and its binding site and the reverse primer and its binding site is recorded for each taxon. The maximum mismatch values are then summed to provide the score for the OTL list. The lowest values indicate lower mismatches between primer and primer-binding sites, facilitating amplification.
  • priming_ratio_sum: sum of the priming ratio across taxon. The priming ratio is computed as the ratio of the maximum number of mismatches at the 3’ end of the primer binding site to the maximum number of mismatches across the entire primer binding site. The lowest values indicate fewer mismatches at the 3’ end of the primer binding site, hence higher binding strength.
  • gc_matches_across_taxon: sum of G-C matches at the 3’ end across all taxa present in the OTL. Higher values are preferable, as a content of 40-60% of GC matches promotes binding.
  • min_tm_cv: The minimum melting temperature (Tm) between each pair of forward and reverse primers is calculated for each taxon. The coefficient of variation across taxa is then determined. Lower values indicate a more consistent thermal performance and are preferable.
  • tm_score: proportion of taxa with a lower or equal variation of Tm below 2ºC. Higher values are preferable as they indicate a better thermal performance across taxa in the OTL.
  • amplification_success_percent: the ratio of taxa that amplify to the total number of taxa with sequences containing primer binding sites, expressed as a percentage. Higher values represent higher amplification success across taxa.

Module 3: Traits and Resolution

  • taxonomic_resolution: percentage of taxa whose genetic divergence is higher than 2%. Higher values are preferable as they indicate an increased possibility of distinguishing between closely related taxa.
  • final_rank: the final ranking position is determined based on the individual ranking scores for each metric, presented in the output file intermediate_ranks, with all metrics weighted equally. Each metric is ranked based on whether higher or lower values are more desirable:
    • Descending (higher is better):
      • barcoded_taxa_one_plus
      • ratio_barcoded_taxa
      • gc_matches_across_taxon
      • tm_score
      • amplification_success_percent
    • Ascending (lower is better):
      • mismatch_score
      • priming_ratio_sum
      • min_tm_cv
      • taxonomic_resolution

For metrics ranked ascending, primers with lower values are preferred. For example, a lower ‘mismatch_score’ is better because it means fewer mismatches. For metrics ranked descending, primers with higher values are preferred.

mozaiko Workflow

Primer rankings are always relative to a specific run, if different primers are given the results will vary.

Contacts

In case of enquiry, please reach out to bu@biopolis.up.pt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mozaiko-0.1.0.tar.gz (83.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mozaiko-0.1.0-py3-none-any.whl (65.8 kB view details)

Uploaded Python 3

File details

Details for the file mozaiko-0.1.0.tar.gz.

File metadata

  • Download URL: mozaiko-0.1.0.tar.gz
  • Upload date:
  • Size: 83.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for mozaiko-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2128221d7717b5339e097a0c91119b8d3b0254fe1a4bc3a5b5c859f628902f4c
MD5 947966ca5834cfb111a558dd42821308
BLAKE2b-256 7bd05c0425fbac326ad0372035380cde2710cf0861368c55677d5fdedd0c6af5

See more details on using hashes here.

File details

Details for the file mozaiko-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mozaiko-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 65.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for mozaiko-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dbfed2b29598b7381b48747b2ec773b85e3fc53243da9a7cf8adcc05ac713394
MD5 e2d263594b365233664717a52242ad29
BLAKE2b-256 01a6e3829dd969c21d2aef26bb463ea8a499483973f83a81c4503f68dd103d77

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page