CLASV is a pipeline designed for rapidly predicting Lassa virus lineages using a Random Forest model.
Project description
CLASV
Overview
Lassa virus lineage prediction based on random forest.
Information on the research can be found here: https://www.biorxiv.org/content/10.1101/2024.07.31.605963v2
Project Repositories
- Data and Processing: LASV_ML_Manuscript_Data
- Lassa Virus Lineage Prediction: CLASV_GITHUB
Jupyter Notebooks on Google Colab
- General Preprocessing: Notebook Link
- Lassa Virus Lineage Prediction Training: Notebook Link
Prediction Pipeline Overview
Running the Pipeline
It is recommended that python 3.11 is used (or at least between 3.6 - 3.11). Python3.11
Highly recommended to use a virtual environment:
python3.11 -m venv myenv #where myenv can be any name of your chioce
source myenv/bin/activate # activates the virtual environment
Install CLASV using pip
pip install clasv
This tool relies on Nextclade for gene extraction and alignment. This is automatically installed. More information about the nextstrain project here: installation guide. This tool uses the Snakemake engine which is automatically installed.
clasv find-lassa --input myinputfolderpath --output mychosenfolderpath --cores 4 #default
Find Fasta files in the input directory and subdirectories recursively:
#
clasv find-lassa --input myinputfolderpath --output mychosenfolderpath --recursive --cores 4 #default
Force rerun:
#
clasv find-lassa --input myinputfolderpath --output mychosenfolderpath --force --cores 4 #default
Upon completion, go to the pipeline 'visuals' folder and open the html files in a browser.
Customization
This pipeline has the ability to process multiple FASTA files containing multiple sequences with proficiency and speed. It is recommended that multiple FASTA files are concatenated into one; however, this is not compulsory, especially if the projects are different. By default, the pipeline finds all files with the extension .fasta in your input_folder folder and tries to find LASV GPC sequences in the files.
To ensure Snakemake has a memory of what files have been checked, intermediary files are created for all files checked, even if they contain no GPC sequences. However, those files would be empty.
Important Outputs
At the end of the run, you can check the predictions folder for the CSV files containing the predictions per sample. A visualization of the prediction can be found in the visuals folder. Open the HTML files in a browser. The images are high quality and reactive, allowing you to hover over them to see more information.
For further details, please refer to the respective notebooks and repositories linked above. You can also leave a comment for help regarding the pipeline.
Model training
Learn how the data was preprocessed here: LASV_ML_Manuscript_Data. Training process here Notebook Link.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clasv-0.1.14.tar.gz.
File metadata
- Download URL: clasv-0.1.14.tar.gz
- Upload date:
- Size: 2.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26805e3ad579ed9b7fcc03ec2e77c1fb72e272737df19375a908da4def769f2c
|
|
| MD5 |
fbcaaf94270ddd9725294d4c9e820768
|
|
| BLAKE2b-256 |
1feaa546746ef065df29e7d448b541720187f00fb3d6313c131edfce2458dcfb
|
File details
Details for the file CLASV-0.1.14-py3-none-any.whl.
File metadata
- Download URL: CLASV-0.1.14-py3-none-any.whl
- Upload date:
- Size: 3.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3287c5c35c5d5f20e36d1447d0ccdcf6d5c985384b6affee0d39383972eeb9e0
|
|
| MD5 |
c55b818b00656ce836c21c9bd4511538
|
|
| BLAKE2b-256 |
1a16db344cddef30afa227564b37850859df2be84ad92983a4e623cc6d22e358
|