A tool for metagenomic taxonomic profiling and abundance matrix generation
This project has been archived.
The maintainers of this project have marked this project as archived. No new releases are expected.
Project description
toxolib
A Python package for metagenomic taxonomic profiling and abundance matrix generation.
Installation
Using pip
pip install toxolib
Install directly from GitHub
pip install git+https://github.com/dhruvac29/toxolib.git
Using conda
We recommend using conda to install all dependencies. An environment file is included in the package:
# Clone the repository
git clone https://github.com/dhruvac29/toxolib.git
cd toxolib
# Create and activate the conda environment
conda env create -f environment.yml
conda activate taxonomy_env
# Install the package
pip install -e .
Requirements
This package requires the following external tools to be installed and available in your PATH:
- Kraken2
- Bracken
- Krona (for visualization)
- fastp (for preprocessing)
- bowtie2 (for host removal)
- samtools
All these dependencies are included in the conda environment file.
You'll also need to set the environment variable KRAKEN2_DB_DIR to the path of your Kraken2 database:
export KRAKEN2_DB_DIR=/path/to/kraken2/database
Usage
Local Usage
Generate abundance matrix from raw data
toxolib abundance -r raw_data_1.fastq.gz raw_data_2.fastq.gz -o output_directory
This will:
- Run Kraken2 on the raw data
- Run Bracken on the Kraken2 results
- Generate an abundance matrix from the Bracken results
Create abundance matrix from existing Bracken files
toxolib matrix -i sample1_species.bracken sample2_species.bracken -o abundance_matrix.csv
HPC Usage
Toxolib can run the analysis pipeline on an HPC cluster using SLURM for job scheduling.
1. Set up HPC connection
toxolib hpc-setup --hostname your-hpc-server.edu --username your-username --key-file ~/.ssh/id_rsa
This will save your HPC connection details to ~/.toxolib/hpc_config.yaml.
2. Run the pipeline on HPC
toxolib hpc -r raw_data_1.fastq.gz raw_data_2.fastq.gz -o /path/on/hpc/output_dir \
--kraken-db /path/on/hpc/kraken2_db \
--corn-db /path/on/hpc/corn_db \
--partition normal --threads 32 --memory 200 --time 144:00:00
This will:
- Upload your raw data files to the HPC
- Create a Snakemake workflow file
- Upload an environment.yml file to the HPC
- Submit a SLURM job to run the analysis
- Return a job ID for tracking
Automatic Conda Environment Creation
When submitting a job to the HPC, toxolib will automatically:
- Upload a conda environment.yml file to the HPC
- Create a conda environment in the output directory if it doesn't exist
- Activate the environment before running the analysis
This ensures all required dependencies are available on the HPC without requiring manual environment setup.
3. Check job status
toxolib hpc-status --job-id your_job_id
4. Download results when complete
toxolib hpc-download --job-id your_job_id --output-dir ./local_results
Database Setup
Kraken2 Database
You can download the standard Kraken2 database from: https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240112.tar.gz
wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240112.tar.gz
tar -xzf k2_standard_20240112.tar.gz -C /path/to/kraken2/database
export KRAKEN2_DB_DIR=/path/to/kraken2/database
Corn Genome Database
For host removal, you can download the corn genome reference from: https://glwasoilmetagenome.s3.us-east-1.amazonaws.com/corn_db.zip
wget https://glwasoilmetagenome.s3.us-east-1.amazonaws.com/corn_db.zip
unzip corn_db.zip -d /path/to/corn_db
Setting up databases on HPC
When using the HPC functionality, you'll need to upload and extract these databases on your HPC system:
# On your local machine, download the databases
wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240112.tar.gz
wget https://glwasoilmetagenome.s3.us-east-1.amazonaws.com/corn_db.zip
# Upload to HPC (using scp)
scp k2_standard_20240112.tar.gz your-username@your-hpc-server.edu:/path/on/hpc/
scp corn_db.zip your-username@your-hpc-server.edu:/path/on/hpc/
# SSH into HPC and extract
ssh your-username@your-hpc-server.edu
mkdir -p /path/on/hpc/kraken2_db
tar -xzf /path/on/hpc/k2_standard_20240112.tar.gz -C /path/on/hpc/kraken2_db
mkdir -p /path/on/hpc/corn_db
unzip /path/on/hpc/corn_db.zip -d /path/on/hpc/corn_db
Then when running toxolib, specify these paths:
toxolib hpc -r raw_data_1.fastq.gz raw_data_2.fastq.gz -o /path/on/hpc/output_dir \
--kraken-db /path/on/hpc/kraken2_db \
--corn-db /path/on/hpc/corn_db
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file toxolib-0.1.2.tar.gz.
File metadata
- Download URL: toxolib-0.1.2.tar.gz
- Upload date:
- Size: 18.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea45760c597f7afe10c028f71a795f7504138c8f79efa8239b15a9b10e679865
|
|
| MD5 |
bb52497041250b0ab429a63763c24a13
|
|
| BLAKE2b-256 |
3bc8110bedfc693e19425f4e0394e90236635f8cd93ce6cf1983a506e51e6109
|
File details
Details for the file toxolib-0.1.2-py3-none-any.whl.
File metadata
- Download URL: toxolib-0.1.2-py3-none-any.whl
- Upload date:
- Size: 18.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67fc964886a5d1f36d730ce977632ac3226425200e27e2154de25db65bcccb82
|
|
| MD5 |
ca9f372efe0e41b1c363ba8afe97ca41
|
|
| BLAKE2b-256 |
5be2b246928a79c80466e091ce4e50d6d92ddcc0b44478c991f5d7ee780df66f
|