Skip to main content

VaRaPS : Variants Ratios from Pooled Sequencing

Project description

VaRaPS: Variants Ratios from Pooled Sequencing

Introduction

VaRaPS (Variants Ratios from Pooled Sequencing) is a Python package orignaly designed for calculating the proportions of SARS-CoV-2 variants from sequencing data. It supports BAM and CRAM file formats and re-implements methods like Freyja[1], LCS[2], and VirPool[3]. VaRaPS is equipped with three modes of operation to cater to various analysis needs.

Table of Contents

  1. Installation
  2. Features
  3. Quick Start
  4. Usage
  5. Understanding the Output of mode 1
  1. Troubleshooting
  2. Contributors
  3. License
  4. Contact
  5. Citation

Installation

Ensure that Python 3.8 or later version is installed on your system before installing VaRaPS.

pip install VaRaPS

Features

  • Implements multiple methods for variant proportion calculations from sequencing data.
  • Offers three deconvolution methods [Co-occurence based methode, Count based method and Frequencies based method] for flexible analysis requirements.
  • Interactive mode prompts users through the analysis process.
  • Supports both BAM and CRAM file formats.

Quick Start

For a quick start, you can run VaRaPS in an interactive mode which will guide you through the process:

varaps

Follow the on-screen prompts to input your data and choose the analysis parameters.

Usage

VaRaPS is designed to be flexible and user-friendly, offering several modes and parameters to fit your analysis needs. Below are detailed explanations of how to use each mode and what each parameter means.

General Command Structure

All commands in VaRaPS follow a basic structure:

varaps --mode <mode_number> [options]

Replace <mode_number> with the mode you wish to use (1, 2, or 3), and [options] with the various options available for that mode, detailed below.

Mode 1: Retrieve Mutations

This mode extracts mutations from reads in BAM/CRAM files, by Doing a variant calling for each read.

varaps --mode 1 --path <path_to_bam_cram_files> --ref <path_to_reference_fasta> [--output <output_directory>] [--percentage <filter_percentage>] [--number <filter_number>]
  • --path <path_to_bam_cram_files>: Specify the directory containing your BAM/CRAM files.
  • --ref <path_to_reference_fasta>: Indicate the path to your reference genome file in FASTA format.
  • --output <output_directory>: (Optional) Designate where you want the results to be saved. By default, results are saved in the current directory.
  • --percentage <filter_percentage>: (Optional) Set the minimum percentage of reads that must contain a mutation for it to be considered significant. The default is 0.0, which means no filtering is applied based on percentage.
  • --number <filter_number>: (Optional) Define the minimum number of reads that must contain a mutation for it to be recognized. The default is 0, which means no filtering is applied based on read count.

Mode 2: Calculate Variant Proportions

In this mode, VaRaPS calculates the proportion of each variant using the output from Mode 1.

varaps --mode 2 --deconv_method <method_number> --NbBootstraps <number_of_bootstraps> --optibyAlpha <optimize_by_alpha> --alphaInit <initial_alpha_value> --path <path_to_data> [--output <output_directory>] --M <path_to_variant_matrix>
  • --deconv_method <method_number>: Choose the deconvolution method to use. The number corresponds to the specific implementation:

    • 1 - Co-occurence based methode [3]
    • 2 - Count based method [2]
    • 3 - Frequencies based method [1]
  • --path <path_to_data>: Specify the path to the input data, which can be the output directory from Mode 1.

  • --M <path_to_variant_matrix>: Provide the path to the variant/mutation profile matrix, which is a CSV file with rows representing variants and columns representing mutations [Exemple file for the variant/mutation profile matrix].

  • --output <output_directory>: (Optional) Indicate the output directory for the results.

  • --NbBootstraps <number_of_bootstraps>: (Optional) Set the number of bootstrap iterations for estimating uncertainty.

  • --optibyAlpha <optimize_by_alpha>: (Optional) Boolean value (True or False) to determine if the algorithm should optimize by the sequencing error rate.

  • --alphaInit <initial_alpha_value>: (Optional) Provide the initial value for the error rate parameter.

Mode 3: Direct Calculation from Files

Mode 3 combines the functionality of Modes 1 and 2 for a direct calculation of variant proportions from BAM/CRAM files without the intermediate step.

varaps --mode 3 --path <path_to_bam_cram_files> --ref <path_to_reference_fasta> --deconv_method <method_number> [--other_options]
  • The parameters for Mode 3 are a combination of those from Modes 1 and 2.
  • Use the same --path, --ref, --output, and --deconv_method parameters as described above.
  • Include any other optional parameters as needed to refine your analysis.

Understanding the Output

VaRaPS generates detailed output files that encapsulate the results of the mutation and variant analysis. Below are the explanations of the files along with examples to help you understand their structure and content.

mutations_index File

  • Filename: mutations_index_<input_file_name>_<options>.csv
  • Contents: Lists all mutations, that passed the filter, found in the input files, serving as an index for the mutations referenced in the Xsparse file.
  • Example:
Mutations
T6TC
C9A
A11G
A11T
AAA14A
A16G
A16AG
...
...
  • Interpretation:
    • Each line represents a unique mutation, identified by a combination of the reference base, the position in the reference sequence, and the alternate base.
    • This file acts as a legend for the mutation indices used in the Xsparse file[e.i The mutation at index 4 is AAA14A.]

Mutation Encoding

  • Format: [reference base][position][alternate base]
  • Example:
  • T6TC indicates a substitution at position 6 where 'T' has been replaced by 'C'.
  • AAA14A suggests a deletion at position 14 where 'AAA' has been shortened to 'A'.
  • A16AG describes an insertion at position 16 where 'G' has been added after 'A'.

Xsparse File

  • Filename: Xsparse_<input_file_name>_<options>.csv
  • Contents: The Xsparse file contains a list of unique reads and the mutations they contain, represented in a sparse matrix format.The Xsparse file is the most important file as it contains the actual data.PS: The number of occurences of each read in BAM/CRAM is stored in the Wsparse file (see below).
  • Example:

startIdx_position,endIdx_position,muts
0,4,
0,44,"0, 2"
0,22,"3,"
1,150,"1, 4"
2,275,"2, 5, 6"
...
...

Interpretation:

  • The columns startIdx_position and endIdx_position define the range of positions covered by a read.
  • The muts column lists the indices of the mutations present in the read within the defined range.
  • For instance:
    • In read 0, it covers the region from position 0 inclusive to position 4 exclusive. It has no mutations.
    • In read 4, it covers the region from position 2 inclusive to position 75 exclusive. The mutations 2, 5, and 6 are found in this read.

Wsparse File

  • Filename: Wsparse_<input_file_name>_<options>.csv
  • Contents: This file associates each read with its frequency in the dataset to optimize data storage.
  • Example:
Counts
2
1
1
1
5
...
...

Interpretation:

  • Each line corresponds to the reads as they are listed in the Xsparse file.
  • The Counts column indicates how many times each respective read appears in the dataset [e.i - Read 4 occurs 5 times in the data.]

Troubleshooting

If you encounter any issues while using VaRaPS, please contact us at djaout [at] lpsm.paris

Contributing

Contributions to VaRaPS are welcome. If you have suggestions or improvements, feel free to mail me at djaout[at]lpsm.paris

License

GNU General Public License v3 or later (GPLv3+)

Contact

For any questions or feedback regarding VaRaPS, feel free to reach out through by mail at djaout[at]lpsm.paris

Citation

To cite the PyPI package 'VaRaPS' in publications, use:

Djaout, E.H. (2024). VaRaPS: Variants Ratios from Pooled Sequencing. PyPI package.

A BibTeX entry for LaTeX users is:

@Manual{,
title = {VaRaPS: Variants Ratios from Pooled Sequencing},
author = {El Hacene Djaout},
year = {2024},
note = {PyPI package},
}

References

[1] S. Karthikeyan et al. “Wastewater sequencing reveals early cryptic SARS-CoV-2 variant transmission”. In: Nature 609.7925 (2022), pp. 101–108.

[2] R. Valieris et al. “A mixture model for determining SARS-CoV-2 variant composition in pooled samples”. In: Bioinformatics 38.7 (2022), pp. 1809–1815.

[3] A. Gafurov et al. “VirPool: Model-based estimation of SARS-CoV-2 variant proportions in wastewater samples”. In: medRxiv (2022).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

varaps-0.8.1.2.tar.gz (37.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

varaps-0.8.1.2-py2.py3-none-any.whl (45.2 kB view details)

Uploaded Python 2Python 3

File details

Details for the file varaps-0.8.1.2.tar.gz.

File metadata

  • Download URL: varaps-0.8.1.2.tar.gz
  • Upload date:
  • Size: 37.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.31.0

File hashes

Hashes for varaps-0.8.1.2.tar.gz
Algorithm Hash digest
SHA256 fba85c2d3c00ad7caaf9795b8396a48c9b429385326ff36c9c5676ac2d8bccef
MD5 534aa035355f302a3c7f8bbf13c1c4f2
BLAKE2b-256 1ca373ff3f3bbd34a6c2ec4c6ea319da2919c3ea269680bebadce529f17b83fc

See more details on using hashes here.

File details

Details for the file varaps-0.8.1.2-py2.py3-none-any.whl.

File metadata

  • Download URL: varaps-0.8.1.2-py2.py3-none-any.whl
  • Upload date:
  • Size: 45.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.31.0

File hashes

Hashes for varaps-0.8.1.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 be3d6f850c475099d87fe72a3586d2e008e903d9feedc9722612596aa18dc221
MD5 5d2d529b831a2dd23baa8cb75f97bf38
BLAKE2b-256 d6993a52648add798dffc4b91d345e9b550c3f55a93e5c8d7b4445dad0d011a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page