VaRaPS : Variants Ratios from Pooled Sequencing
Project description
VaRaPS: Variants Ratios from Pooled Sequencing
Introduction
VaRaPS (Variants Ratios from Pooled Sequencing) is a Python package orignaly designed for calculating the proportions of SARS-CoV-2 variants from sequencing data. It supports BAM and CRAM file formats and re-implements methods like Freyja[1], LCS[2], and VirPool[3]. VaRaPS is equipped with three modes of operation to cater to various analysis needs.
Table of Contents
Installation
Ensure that Python 3.8 or later version is installed on your system before installing VaRaPS.
pip install VaRaPS
Features
- Implements multiple methods for variant proportion calculations from sequencing data.
- Offers three deconvolution methods [Co-occurence based methode, Count based method and Frequencies based method] for flexible analysis requirements.
- Interactive mode prompts users through the analysis process.
- Supports both BAM and CRAM file formats.
Quick Start
For a quick start, you can run VaRaPS in an interactive mode which will guide you through the process:
varaps
Follow the on-screen prompts to input your data and choose the analysis parameters.
Usage
VaRaPS is designed to be flexible and user-friendly, offering several modes and parameters to fit your analysis needs. Below are detailed explanations of how to use each mode and what each parameter means.
General Command Structure
All commands in VaRaPS follow a basic structure:
varaps --mode <mode_number> [options]
Replace <mode_number> with the mode you wish to use (1, 2, or 3), and [options] with the various options available for that mode, detailed below.
Mode 1: Retrieve Mutations
This mode extracts mutations from reads in BAM/CRAM files, by Doing a variant calling for each read.
varaps --mode 1 --path <path_to_bam_cram_files> --ref <path_to_reference_fasta> [--output <output_directory>] [--percentage <filter_percentage>] [--number <filter_number>]
--path <path_to_bam_cram_files>: Specify the directory containing your BAM/CRAM files.--ref <path_to_reference_fasta>: Indicate the path to your reference genome file in FASTA format.--output <output_directory>: (Optional) Designate where you want the results to be saved. By default, results are saved in the current directory.--percentage <filter_percentage>: (Optional) Set the minimum percentage of reads that must contain a mutation for it to be considered significant. The default is 0.0, which means no filtering is applied based on percentage.--number <filter_number>: (Optional) Define the minimum number of reads that must contain a mutation for it to be recognized. The default is 0, which means no filtering is applied based on read count.
Mode 2: Calculate Variant Proportions
In this mode, VaRaPS calculates the proportion of each variant using the output from Mode 1.
varaps --mode 2 --deconv_method <method_number> --NbBootstraps <number_of_bootstraps> --optibyAlpha <optimize_by_alpha> --alphaInit <initial_alpha_value> --path <path_to_data> [--output <output_directory>] --M <path_to_variant_matrix>
-
--deconv_method <method_number>: Choose the deconvolution method to use. The number corresponds to the specific implementation: -
--path <path_to_data>: Specify the path to the input data, which can be the output directory from Mode 1. -
--M <path_to_variant_matrix>: Provide the path to the variant/mutation profile matrix, which is a CSV file with rows representing variants and columns representing mutations [Exemple file for the variant/mutation profile matrix]. -
--output <output_directory>: (Optional) Indicate the output directory for the results. -
--NbBootstraps <number_of_bootstraps>: (Optional) Set the number of bootstrap iterations for estimating uncertainty. -
--optibyAlpha <optimize_by_alpha>: (Optional) Boolean value (TrueorFalse) to determine if the algorithm should optimize by the sequencing error rate. -
--alphaInit <initial_alpha_value>: (Optional) Provide the initial value for the error rate parameter.
Mode 3: Direct Calculation from Files
Mode 3 combines the functionality of Modes 1 and 2 for a direct calculation of variant proportions from BAM/CRAM files without the intermediate step.
varaps --mode 3 --path <path_to_bam_cram_files> --ref <path_to_reference_fasta> --deconv_method <method_number> [--other_options]
- The parameters for Mode 3 are a combination of those from Modes 1 and 2.
- Use the same
--path,--ref,--output, and--deconv_methodparameters as described above. - Include any other optional parameters as needed to refine your analysis.
Understanding the Output
VaRaPS generates detailed output files that encapsulate the results of the mutation and variant analysis. Below are the explanations of the files along with examples to help you understand their structure and content.
mutations_index File
- Filename:
mutations_index_<input_file_name>_<options>.csv - Contents: Lists all mutations, that passed the filter, found in the input files, serving as an index for the mutations referenced in the Xsparse file.
- Example:
Mutations
T6TC
C9A
A11G
A11T
AAA14A
A16G
A16AG
...
...
- Interpretation:
- Each line represents a unique mutation, identified by a combination of the reference base, the position in the reference sequence, and the alternate base.
- This file acts as a legend for the mutation indices used in the Xsparse file[e.i The mutation at index 4 is
AAA14A.]
Mutation Encoding
- Format:
[reference base][position][alternate base] - Example:
T6TCindicates a substitution at position 6 where 'T' has been replaced by 'C'.AAA14Asuggests a deletion at position 14 where 'AAA' has been shortened to 'A'.A16AGdescribes an insertion at position 16 where 'G' has been added after 'A'.
Xsparse File
- Filename:
Xsparse_<input_file_name>_<options>.csv - Contents: The Xsparse file contains a list of unique reads and the mutations they contain, represented in a sparse matrix format.The
Xsparsefile is the most important file as it contains the actual data.PS: The number of occurences of each read in BAM/CRAM is stored in the Wsparse file (see below). - Example:
startIdx_position,endIdx_position,muts
0,4,
0,44,"0, 2"
0,22,"3,"
1,150,"1, 4"
2,275,"2, 5, 6"
...
...
Interpretation:
- The columns
startIdx_positionandendIdx_positiondefine the range of positions covered by a read. - The
mutscolumn lists the indices of the mutations present in the read within the defined range. - For instance:
- In read 0, it covers the region from position 0 inclusive to position 4 exclusive. It has no mutations.
- In read 4, it covers the region from position 2 inclusive to position 75 exclusive. The mutations 2, 5, and 6 are found in this read.
Wsparse File
- Filename:
Wsparse_<input_file_name>_<options>.csv - Contents: This file associates each read with its frequency in the dataset to optimize data storage.
- Example:
Counts
2
1
1
1
5
...
...
Interpretation:
- Each line corresponds to the reads as they are listed in the Xsparse file.
- The
Countscolumn indicates how many times each respective read appears in the dataset [e.i - Read 4 occurs 5 times in the data.]
Troubleshooting
If you encounter any issues while using VaRaPS, please contact us at djaout [at] lpsm.paris
Contributing
Contributions to VaRaPS are welcome. If you have suggestions or improvements, feel free to mail me at djaout[at]lpsm.paris
License
GNU General Public License v3 or later (GPLv3+)
Contact
For any questions or feedback regarding VaRaPS, feel free to reach out through by mail at djaout[at]lpsm.paris
Citation
To cite the PyPI package 'VaRaPS' in publications, use:
Djaout, E.H. (2024). VaRaPS: Variants Ratios from Pooled Sequencing. PyPI package.
A BibTeX entry for LaTeX users is:
@Manual{,
title = {VaRaPS: Variants Ratios from Pooled Sequencing},
author = {El Hacene Djaout},
year = {2024},
note = {PyPI package},
}
References
[1] S. Karthikeyan et al. “Wastewater sequencing reveals early cryptic SARS-CoV-2 variant transmission”. In: Nature 609.7925 (2022), pp. 101–108.
[2] R. Valieris et al. “A mixture model for determining SARS-CoV-2 variant composition in pooled samples”. In: Bioinformatics 38.7 (2022), pp. 1809–1815.
[3] A. Gafurov et al. “VirPool: Model-based estimation of SARS-CoV-2 variant proportions in wastewater samples”. In: medRxiv (2022).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file varaps-0.8.1.2.tar.gz.
File metadata
- Download URL: varaps-0.8.1.2.tar.gz
- Upload date:
- Size: 37.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.31.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fba85c2d3c00ad7caaf9795b8396a48c9b429385326ff36c9c5676ac2d8bccef
|
|
| MD5 |
534aa035355f302a3c7f8bbf13c1c4f2
|
|
| BLAKE2b-256 |
1ca373ff3f3bbd34a6c2ec4c6ea319da2919c3ea269680bebadce529f17b83fc
|
File details
Details for the file varaps-0.8.1.2-py2.py3-none-any.whl.
File metadata
- Download URL: varaps-0.8.1.2-py2.py3-none-any.whl
- Upload date:
- Size: 45.2 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.31.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be3d6f850c475099d87fe72a3586d2e008e903d9feedc9722612596aa18dc221
|
|
| MD5 |
5d2d529b831a2dd23baa8cb75f97bf38
|
|
| BLAKE2b-256 |
d6993a52648add798dffc4b91d345e9b550c3f55a93e5c8d7b4445dad0d011a0
|