Skip to main content

CLASV is a pipeline designed for rapidly predicting Lassa virus lineages using a Random Forest model.

Project description

CLASV

Overview

Lassa virus lineage prediction based on random forest.

Information on the research can be found here: https://www.biorxiv.org/content/10.1101/2024.07.31.605963v2

Project Repositories

Jupyter Notebooks on Google Colab

Prediction Pipeline Overview

CLASV

Running the Pipeline

It is recommended that python 3.11 is used (or at least between 3.6 - 3.11). Python3.11

Highly recommended to use a virtual environment:

python3.11 -m venv myenv #where myenv can be any name of your chioce

source myenv/bin/activate  # activates the virtual environment

Install CLASV using pip

pip install clasv

This tool relies on Nextclade for gene extraction and alignment. This is automatically installed. More information about the nextstrain project here: installation guide. This tool uses the Snakemake engine which is automatically installed.

clasv find-lassa --input myinputfolderpath --output mychosenfolderpath --cores 4 #default --minlength 500

Find Fasta files in the input directory and subdirectories recursively:

# 
clasv find-lassa --input myinputfolderpath --output mychosenfolderpath --recursive --cores 4 #default

Force rerun:

# 
clasv find-lassa --input myinputfolderpath --output mychosenfolderpath --force --cores 4 #default

Upon completion, go to the pipeline 'visuals' folder and open the html files in a browser.

Customization

This pipeline has the ability to process multiple FASTA files containing multiple sequences with proficiency and speed. It is recommended that multiple FASTA files are concatenated into one; however, this is not compulsory, especially if the projects are different. By default, the pipeline finds all files with the extension .fasta in your input_folder folder and tries to find LASV GPC sequences in the files.

To ensure Snakemake has a memory of what files have been checked, intermediary files are created for all files checked, even if they contain no GPC sequences. However, those files would be empty.

Important Outputs

At the end of the run, you can check the predictions folder for the CSV files containing the predictions per sample. A visualization of the prediction can be found in the visuals folder. Open the HTML files in a browser. The images are high quality and reactive, allowing you to hover over them to see more information.

For further details, please refer to the respective notebooks and repositories linked above. You can also leave a comment for help regarding the pipeline.

Model training

Learn how the data was preprocessed here: LASV_ML_Manuscript_Data. Training process here Notebook Link.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clasv-0.1.15.tar.gz (2.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

CLASV-0.1.15-py3-none-any.whl (3.5 MB view details)

Uploaded Python 3

File details

Details for the file clasv-0.1.15.tar.gz.

File metadata

  • Download URL: clasv-0.1.15.tar.gz
  • Upload date:
  • Size: 2.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.11

File hashes

Hashes for clasv-0.1.15.tar.gz
Algorithm Hash digest
SHA256 91f14f3e678a36d74043c46a4b7b1baddcb7d1b48a4238f3767a25c2061b8585
MD5 a958359e72e1eb11951ebb1c692dfa00
BLAKE2b-256 0488bb8ba072c5ce1e54697cb4049212cf1520972dadbdb0af3e65bd0caf6307

See more details on using hashes here.

File details

Details for the file CLASV-0.1.15-py3-none-any.whl.

File metadata

  • Download URL: CLASV-0.1.15-py3-none-any.whl
  • Upload date:
  • Size: 3.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.11

File hashes

Hashes for CLASV-0.1.15-py3-none-any.whl
Algorithm Hash digest
SHA256 6438dc1c06d91bb6b680e499ef1428a9ded2624260ed9965a28280cfcb065022
MD5 f980f0b08530cfd6e8c5f7afaf5c6823
BLAKE2b-256 01a7fb8fe230851b0d19d0c9df27bb7b096c9c7b0d25d999999213f7fb663bea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page