CLASV is a pipeline designed for rapidly predicting Lassa virus lineages using a Random Forest model.

These details have not been verified by PyPI

Project links

Homepage

Project description

CLASV

Overview

Lassa virus lineage prediction based on random forest.

Information on the research can be found here: https://github.com/JoiRichi/CLASV

Project Repositories

Data and Processing: LASV_ML_Manuscript_Data
Lassa Virus Lineage Prediction: CLASV_GITHUB

Jupyter Notebooks on Google Colab

General Preprocessing: Notebook Link
Lassa Virus Lineage Prediction Training: Notebook Link

Prediction Pipeline Overview

CLASV

Installation Guide

Step 1: Install Python 3.11

CLASV requires Python 3.11 for optimal compatibility.

macOS/Linux: Download from Python.org or use a package manager:

# macOS with Homebrew
brew install python@3.11

# Ubuntu/Debian
sudo apt update
sudo apt install python3.11 python3.11-venv python3.11-dev

Windows: Download and run the installer from Python.org. Be sure to check "Add Python 3.11 to PATH" during installation.

Step 2: Verify Python Installation

Confirm that Python 3.11 is installed correctly:

python3.11 --version
# Should output: Python 3.11.x

Step 3: Create a Virtual Environment

# Create a dedicated directory for your project (optional)
mkdir clasv_project
cd clasv_project

# Create a virtual environment
python3.11 -m venv clasv_env

# Activate the virtual environment
# On macOS/Linux:
source clasv_env/bin/activate
# On Windows:
# clasv_env\Scripts\activate

# Confirm you're using Python 3.11 in the virtual environment
python --version
# Should output: Python 3.11.x

Step 4: Install CLASV

With your virtual environment activated:

# Update pip to the latest version
pip install --upgrade pip

# Install CLASV
pip install clasv

Step 5: Verify Installation

# Check that CLASV is installed
clasv --version

# Test the help command
clasv -h

Troubleshooting Installation

If you encounter errors with dependencies, try installing in a fresh virtual environment.
If Nextclade fails to install automatically, run your CLASV command again, as it attempts installation on first use.
For Snakemake-related errors, ensure you're using Python 3.11 as other versions may have compatibility issues.

Running the Pipeline

The main command for CLASV is find-lassa. This is how you run it:

clasv find-lassa --input myinputfolderpath --output mychosenfolderpath --cores 4 --minlength 500 #default

Find Fasta files in the input directory and subdirectories recursively:

# 
clasv find-lassa --input myinputfolderpath --output mychosenfolderpath  --cores 4 --recursive #Add the recursive flag

Force rerun:

# 
clasv find-lassa --input myinputfolderpath --output mychosenfolderpath --cores 4 --force #add the force flag

Upon completion, go to the pipeline 'visuals' folder and open the html files in a browser.

Customization

This pipeline has the ability to process multiple FASTA files containing multiple sequences with proficiency and speed. It is recommended that multiple FASTA files are concatenated into one; however, this is not compulsory, especially if the projects are different. By default, the pipeline finds all files with the extension .fasta in your input_folder folder and tries to find LASV GPC sequences in the files.

To ensure Snakemake has a memory of what files have been checked, intermediary files are created for all files checked, even if they contain no GPC sequences. However, those files would be empty.

Important Outputs

At the end of the run, you can check the predictions folder for the CSV files containing the predictions per sample. A visualization of the prediction can be found in the visuals folder. Open the HTML files in a browser. The images are high quality and reactive, allowing you to hover over them to see more information.

For further details, please refer to the respective notebooks and repositories linked above. You can also leave a comment for help regarding the pipeline.

Technical Documentation

Command Line Interface Options

The CLASV tool provides the following command-line options:

clasv find-lassa [options]

Options:

--input: Path to the input folder containing FASTA files (required)
--output: Path to the output folder for results (required)
--recursive: Search input folder recursively for FASTA files
--cores: Number of CPU cores to use (default: 4)
--force: Force rerun of all pipeline steps
--minlength: Minimum length of GPC sequences to consider (default: 500)
--version: Show the version number and exit

Pipeline Workflow

CLASV executes the following sequence of operations:

Dependency Check: Verifies Nextclade and Seqkit are installed or installs them automatically
Preprocessing: Collects and prepares input FASTA files
Alignment & Extraction: Uses Nextclade to align sequences and extract GPC regions
Translation: Translates nucleotide sequences to amino acids
Encoding: One-hot encodes amino acid sequences
Prediction: Applies Random Forest model to predict Lassa virus lineages
Visualization: Generates plots and visualizations of prediction results

Directory Structure

After running the pipeline, the output folder will contain:

results/: Intermediate files from the pipeline
- preprocessed/: Preprocessed input files
- *_extracted_GPC_sequences.fasta: Extracted GPC sequences
- *_extracted_GPC_sequences_aa.fasta: Translated amino acid sequences
- *_extracted_GPC_sequences_aa_encoded.csv: Encoded sequences
predictions/: CSV files containing lineage predictions
visuals/: HTML visualization files

Troubleshooting

Common Issues

Python Version Issues: If you encounter errors during installation or runtime, ensure you are using Python 3.11.
```
python --version
```
Nextclade Installation Failure: If Nextclade fails to install automatically:
- Ensure you have appropriate permissions
- Try running the CLASV command again
- Consider installing Nextclade manually
Snakemake Compatibility: If you encounter Snakemake errors, trying reinstalling CLASV in a fresh virtual environment with Python 3.11.
Memory Issues: For large datasets, increase available memory or process files in smaller batches.

Model training

Learn how the data was preprocessed here: LASV_ML_Manuscript_Data. Training process here Notebook Link.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.0.6

Aug 9, 2025

1.0.5

Aug 9, 2025

1.0.4

Aug 9, 2025

This version

1.0.2

May 30, 2025

1.0.1

May 9, 2025

1.0.0

May 9, 2025

0.1.16

Dec 15, 2024

0.1.15

Dec 15, 2024

0.1.14

Dec 11, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clasv-1.0.2.tar.gz (1.9 MB view details)

Uploaded May 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

clasv-1.0.2-py3-none-any.whl (2.4 MB view details)

Uploaded May 30, 2025 Python 3

File details

Details for the file clasv-1.0.2.tar.gz.

File metadata

Download URL: clasv-1.0.2.tar.gz
Upload date: May 30, 2025
Size: 1.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for clasv-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`9d6109c24056d47cf964a41e49c01170d0184da84e53ce61cdc03d8a82789c96`
MD5	`dcc4c69acc9069d7766ca9c0eb89d65b`
BLAKE2b-256	`d88bfa027f178c8e86e464814497c55bb250f9e751d5281ae8d0f8b628c53c21`

See more details on using hashes here.

File details

Details for the file clasv-1.0.2-py3-none-any.whl.

File metadata

Download URL: clasv-1.0.2-py3-none-any.whl
Upload date: May 30, 2025
Size: 2.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for clasv-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9c0dfb9e1a39d563e7b4ede8a0b594910657704ec07f50559e10d8bf7b4a99cf`
MD5	`735e61dd9c06234a19f90468c360cc78`
BLAKE2b-256	`aedbd58479423cbcd8bfc12fd34b84f8e98911e6e5585b5be6e7ad9139295b29`

See more details on using hashes here.

CLASV 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CLASV

Overview

Project Repositories

Jupyter Notebooks on Google Colab

Prediction Pipeline Overview

Installation Guide

Step 1: Install Python 3.11

Step 2: Verify Python Installation

Step 3: Create a Virtual Environment

Step 4: Install CLASV

Step 5: Verify Installation

Troubleshooting Installation

Running the Pipeline

Customization

Important Outputs

Technical Documentation

Command Line Interface Options

Pipeline Workflow

Directory Structure

Troubleshooting

Common Issues

Model training

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes