Piecing together complete genetic coverage for biomonitoring.
Project description
mozaiko: Piecing Together Complete Genetic Coverage for Biomonitoring
mozaiko is a bioinformatics tool designed to help researchers select optimized sets of primers for complete coverage in biomonitoring studies. Taking inspiration from mosaic art, where small pieces fit together to form a whole, mozaiko supports comprehensive genetic marker analysis by suggesting a fitting combination of primers.
The name comes from the Esperanto word 'Mozaiko', reflecting the idea of bringing different elements together. With mozaiko, researchers can efficiently select primer sets for a range of applications, making biomonitoring and ecological studies more reliable and comparable.
Installation instructions
Prerequisites
- Python 3.x
- Conda (Miniconda or Anaconda)
- Git
Installation
-
Clone the repository:
git clone git@github.com:CIBIO-BU/mozaiko.git
cd mozaiko
-
Run the installation script:
chmod +x conda_env_setup.sh
./conda_env_setup.sh
-
Activate the environment:
conda activate mozaiko
-
Install mozaiko:
pip install -e .
Th installation script will:
- Check if Conda is installed;
- Create a new Conda environment named "mozaiko", if it does not yet exist;
- Activate the Conda environment;
- Clone the mozaiko repository, if not already cloned;
- Install the mozaiko package;
- Install required dependencies and tools.
mozaiko Metrics' System
mozaiko contains three main categories to evaluate and rank primer sets:
Module 1: Reference Database Quality
- barcoded_taxa_one_plus: percentage of taxa in OTL with more than one barcode. A barcode must include the target insert to be considered.
- ratio_barcoded_taxa: proportion of taxa in OTL with high barcode coverage (more than five barcodes) relative to taxa with minimal barcode coverage (at least one barcode). The value ranges from 0 to 1, 1 representing the optimal scenario.
Module 2: Binding
- mismatch_score: the maximum number of mismatches between the forward primer and its binding site and the reverse primer and its binding site is recorded for each taxon. The maximum mismatch values are then summed to provide the score for the OTL list. The lowest values indicate lower mismatches between primer and primer-binding sites, facilitating amplification.
- priming_ratio_sum: sum of the priming ratio across taxon. The priming ratio is computed as the ratio of the maximum number of mismatches at the 3’ end of the primer binding site to the maximum number of mismatches across the entire primer binding site. The lowest values indicate fewer mismatches at the 3’ end of the primer binding site, hence higher binding strength.
- gc_matches_across_taxon: sum of G-C matches at the 3’ end across all taxa present in the OTL. Higher values are preferable, as a content of 40-60% of GC matches promotes binding.
- min_tm_cv: The minimum melting temperature (Tm) between each pair of forward and reverse primers is calculated for each taxon. The coefficient of variation across taxa is then determined. Lower values indicate a more consistent thermal performance and are preferable.
- tm_score: proportion of taxa with a lower or equal variation of Tm below 2ºC. Higher values are preferable as they indicate a better thermal performance across taxa in the OTL.
- amplification_success_percent: the ratio of taxa that amplify to the total number of taxa with sequences containing primer binding sites, expressed as a percentage. Higher values represent higher amplification success across taxa.
Module 3: Traits and Resolution
- taxonomic_resolution: percentage of taxa whose genetic divergence is higher than 2%. Higher values are preferable as they indicate an increased possibility of distinguishing between closely related taxa.
- final_rank: the final ranking position is determined based on the individual ranking scores for each metric, presented in the output file intermediate_ranks, with all metrics weighted equally. Each metric is ranked based on whether higher or lower values are more desirable:
- Descending (higher is better):
- barcoded_taxa_one_plus
- ratio_barcoded_taxa
- gc_matches_across_taxon
- tm_score
- amplification_success_percent
- Ascending (lower is better):
- mismatch_score
- priming_ratio_sum
- min_tm_cv
- taxonomic_resolution
- Descending (higher is better):
For metrics ranked ascending, primers with lower values are preferred. For example, a lower ‘mismatch_score’ is better because it means fewer mismatches. For metrics ranked descending, primers with higher values are preferred.
mozaiko Workflow
Primer rankings are always relative to a specific run, if different primers are given the results will vary.
Contacts
In case of enquiry, please reach out to bu@biopolis.up.pt.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mozaiko-0.1.0.tar.gz.
File metadata
- Download URL: mozaiko-0.1.0.tar.gz
- Upload date:
- Size: 83.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2128221d7717b5339e097a0c91119b8d3b0254fe1a4bc3a5b5c859f628902f4c
|
|
| MD5 |
947966ca5834cfb111a558dd42821308
|
|
| BLAKE2b-256 |
7bd05c0425fbac326ad0372035380cde2710cf0861368c55677d5fdedd0c6af5
|
File details
Details for the file mozaiko-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mozaiko-0.1.0-py3-none-any.whl
- Upload date:
- Size: 65.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dbfed2b29598b7381b48747b2ec773b85e3fc53243da9a7cf8adcc05ac713394
|
|
| MD5 |
e2d263594b365233664717a52242ad29
|
|
| BLAKE2b-256 |
01a6e3829dd969c21d2aef26bb463ea8a499483973f83a81c4503f68dd103d77
|