Skip to main content

No project description provided

Project description



floria-strainer: strains out genomes from floria.

Introduction

Given the output of the strain haplotyping software floria[^1] , floria-strainer computes the allele frequency at each variable position of each haploset identified by floria to cluster them into the different strains composing the mixture, using a Gaussian Mixture Model.

Install

pip install git+https://github.com/maxibor/floria-strainer.git

Quick start

Running floria-strainer on the provided test data

$ floria-strainer -b tests/data/test_short.bam tests/data/floria_out_dir
INFO - Writing the straining informations to floria_strained.strained.csv.
INFO - 2 strains were found: 0, 1
INFO - Writing the BAM file in tag mode to floria_strained.bam.

Visualizing the strain clustering method

  • In this example, short-reads are aligned to a reference genome (fig 1). At each variable position, we can observe a 2 different alleles, which in the case of this haploid organism, corresponds to a mixture of 2 different strains.

  • floria-strainer takes the output of floria, with reads having been assigned to a haploset (fig 2) based on the MEC criteria.

    Expand to see a high level overview of floria

    The haplotyping process floria is conceptually similar to the one of de novo assembly of reads into contigs, where instead of contigs, read haplosets are the results of the flow (ie. least costly path) optimization through the different reads represented as a graph. More details in the floria article [^1].

  • Based on the average allele frequency of each haploset, reads are clustered in the different strains. In this example (fig 3), there are two strains of minor and major allele frequency, which floria-strainer clustered in strain 0 and strain 1.

Reads that weren't assigned to any haploset by floria, or whose haploset do not cluster well enough are not assigned to any strain. They are considered to be shared by the different strains present in the alignment.

Fig 1: Reads aligned to the reference genome, visualized in IGV. The top track represents the reference genome, with variants indicated in the different colors.

Fig 2: Reads are grouped and colored by the HP HaPloset tag as annotated by Floria.

Fig 3: Reads are grouped and colored by the ST STrain tag as annotated by floria-strainer.

Help

$floria-strainer --help
                                                                                                                                                                            
 Usage: floria-strainer [OPTIONS] FLORIA_OUTDIR                                                                                                                             
                                                                                                                                                                            
 Strain the haplotypes in the floria output directory.                                                                                                                      
 Author: Maxime Borry                                                                                                                                                       
                                                                                                                                                                            
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│    --version                      Show the version and exit.                                                                                                             │
│    --nb-strains  -n  INTEGER      Number of strains to keep. If 0, the number of strains will be determined by the mean floria average strain count with HAPQ > 15.      │
│                                   [default: 0]                                                                                                                           │
│    --hapq-cut    -h  INTEGER      Minimum HAPQ threshold [default: 15]                                                                                                   │
│    --sp-cut      -s  FLOAT        Minimum strain clustering probability threshold [default: 0.5]                                                                         │
│ *  --bam         -b  PATH         Input BAM file [required]                                                                                                              │
│    --mode        -m  [tag|split]  BAM output mode. Tag: add ST (strain) tags to the reads. Split: split the reads in different BAM files per strain. [default: tag]      │
│ *  --basename    -o  TEXT         Output file basaneme [default: floria_strained] [required]                                                                             │
│    --help                         Show this message and exit.                                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Tests

$ pip install poetry pytest
$ poetry run pytest -vv

[^1]: Floria: fast and accurate strain haplotyping in metagenomes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

floria_strainer-0.2.0.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

floria_strainer-0.2.0-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file floria_strainer-0.2.0.tar.gz.

File metadata

  • Download URL: floria_strainer-0.2.0.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/5.4.0-72-generic

File hashes

Hashes for floria_strainer-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a546fd382f5b97b229f1b26e68fdf05b72d234bb0d6201a23c55566ec8b9d5e8
MD5 00800e11f8b73c351edaea9f6f789282
BLAKE2b-256 346afc6e11e4db62f683409415fb18b6433f8e5b0781d7abbdf8b0d3ff1e1144

See more details on using hashes here.

File details

Details for the file floria_strainer-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: floria_strainer-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 20.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/5.4.0-72-generic

File hashes

Hashes for floria_strainer-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 df197d8fc80341e58e607df4d8604fa15fba6173bf92681e4ff5a5fffe105be9
MD5 117dc581b27f0fda5ee2c2a66d04f847
BLAKE2b-256 60a8c3e8ad03ea35beedbb6be7ff88a2bd7d3266f7bd91aee8fbc49ed63eeca2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page