Skip to main content

A multi-sample identification of ribosomal content

Project description

https://badge.fury.io/py/sequana-ribofinder.svg JOSS (journal of open source software) DOI https://github.com/sequana/ribofinder/actions/workflows/main.yml/badge.svg Python 3.9 | 3.10 | 3.11 | 3.12

This is is the ribofinder pipeline from the Sequana project

Overview:

Simple parallele workflow to detect and report ribosomal content

Input:

FastQ files

Output:

HTML reports

Status:

production

Citation:

Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352

Installation

Using pip from Python, just install this package:

pip install sequana_ribofinder --upgrade

The –upgrade option is to make sure you’ll get the latest version.

Usage

This pipeline scans input fastq.gz files found in the local directory and identify the proportion of ribosomal content.

For help, please type:

sequana_ribofinder --help

The following command searches for input files in DATAPATH. Then, te user provide a list of rRNA sequences in FastA format in test.fasta. This command creates a directory called ribofinder/ where a snakemake pipeline can:

sequana_ribofinder --input-directory DATAPATH --rRNA-file test.fasta

You will then need to execute the pipeline:

cd ribofinder
sh ribofinder.sh  # for a local run

This launch a snakemake pipeline. If you are familiar with snakemake, you can retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters:

snakemake -s ribofinder.rules -c config.yaml --cores 4 --wrapper-prefix git+file:////home/user/sequana_wrappers

Or use sequanix interface.

Requirements

This pipelines requires the following executable(s):

  • bowtie2 >= 2.4.0

  • bwa

  • sambamba

  • bedtools

  • samtools

  • pigz

The aligner is selectable at the command line with --aligner bowtie2 (default) or --aligner bwa. You only need to install the one(s) you intend to use.

https://raw.githubusercontent.com/sequana/ribofinder/master/sequana_pipelines/ribofinder/dag.png

Details

This pipeline runs ribofinder in parallel on the input fastq files. A brief sequana summary report is also produced.

You can start from the reference file and the GFF file. By default we search for the feature called rRNA to be found in the GFF file:

sequana_ribofinder --input-directory . --reference-file genome.fasta --gff-file genome.gff

If the default feature rRNA is not found, no error is raised for now. If you know the expected feature, you can provide it though:

sequana_ribofinder --input-directory . --reference-file genome.fasta --gff-file genome.gff --rRNA-feature gene_rRNA

If you have an existing or custom rRNA file, you can then use it as follows, in which case, no input reference is required:

sequana_ribofinder --input-directory . --rRNA-file ribo.fasta

Rules and configuration details

Here is the latest documented configuration file to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.

Changelog

Version

Description

1.2.0

  • Replace bowtie1 with bowtie2 (default) and bwa as aligner options for rRNA mapping; the aligner is now user-selectable via --aligner bowtie2|bwa.

  • Drop the bowtie1-specific fix_bowtie1_log workaround and the standalone bam_indexing / samtools_faidx rules; the new sequana-wrappers already produce sorted+indexed BAMs and a faidx’ed reference.

  • For bwa, feed multiqc with samtools stats output (bwa has no native multiqc module); the mapping-rate plot in the HTML report is built from multiqc_samtools_stats.txt.

  • For bowtie2, the HTML summary uses sequana.multiqc.plots.Bowtie2 which handles both single-end and paired-end outputs.

  • Rewrite the HTML summary introduction to explain what the pipeline does, how it works step-by-step, and how to interpret each of the three plots (mapping rate, per-sequence proportions, per-sequence RPKM). (bowtie2, bwa, sambamba replace bowtie/bamtools).

1.1.1

  • hotfix for running on HPC (slurm)

1.1.0

  • Uses click (refactoring of sequana_pipetools)

1.0.1

  • add sequana_wrappers in the config/pipeline

1.0.0

  • use graphviz apptainer and latest wrappers

0.13.0

  • add final apptainers and update CI actions

0.12.0

  • set singularity containers

0.11.1

  • Fix config file (removing hard-coded path)

0.11.0

  • Fix multiqc plot using same fix as in sequna_rnaseq pipelines

  • add utility plot to check rate of ribosomal per sequence and also the corresponding RPKM.

0.10.2

  • Fix the bowtie1 rule (all samples were named bowtie1)

0.10.1

  • add additional test and fix bug in pipeline (regression bug)

0.10.0

  • Update to use sequana-wrappers. Remove multiqc. summary.html is self-content

0.9.3

  • fix logger

0.9.2

First release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequana_ribofinder-1.2.0.tar.gz (36.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sequana_ribofinder-1.2.0-py3-none-any.whl (36.0 kB view details)

Uploaded Python 3

File details

Details for the file sequana_ribofinder-1.2.0.tar.gz.

File metadata

  • Download URL: sequana_ribofinder-1.2.0.tar.gz
  • Upload date:
  • Size: 36.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sequana_ribofinder-1.2.0.tar.gz
Algorithm Hash digest
SHA256 8fc0ed00b381d51b620b22ae0b0fc9a6912bfd4e9f10ee36825d67dc0fafe21b
MD5 5d3d992c49233f6ce28cd96ae680d495
BLAKE2b-256 97cc1f8252de7a8ec2aa40ab86ea1338ee5a577b819bd567b538f90cb810214f

See more details on using hashes here.

File details

Details for the file sequana_ribofinder-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sequana_ribofinder-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ca403aa11507eff5740eb91ded914f633ba55063926311ccdab7396227423ad
MD5 649ddc4398a4f70a50cde52882d9af16
BLAKE2b-256 a7ee3dc8d97492c200af9606230d4350481f464b42a301544ab9cd42f9311a8d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page