Skip to main content

A tool for metagenomic taxonomic profiling and abundance matrix generation

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

toxolib

A Python package for metagenomic taxonomic profiling and abundance matrix generation.

Installation

Using pip

pip install toxolib

Install directly from GitHub

pip install git+https://github.com/dhruvac29/toxolib.git

Using conda

We recommend using conda to install all dependencies. An environment file is included in the package:

# Clone the repository
git clone https://github.com/dhruvac29/toxolib.git
cd toxolib

# Create and activate the conda environment
conda env create -f environment.yml
conda activate taxonomy_env

# Install the package
pip install -e .

Requirements

This package requires the following external tools to be installed and available in your PATH:

  • Kraken2
  • Bracken
  • Krona (for visualization)
  • fastp (for preprocessing)
  • bowtie2 (for host removal)
  • samtools

All these dependencies are included in the conda environment file.

You'll also need to set the environment variable KRAKEN2_DB_DIR to the path of your Kraken2 database:

export KRAKEN2_DB_DIR=/path/to/kraken2/database

Usage

Local Usage

Generate abundance matrix from raw data

toxolib abundance -r raw_data_1.fastq.gz raw_data_2.fastq.gz -o output_directory

This will:

  1. Run Kraken2 on the raw data
  2. Run Bracken on the Kraken2 results
  3. Generate an abundance matrix from the Bracken results

Create abundance matrix from existing Bracken files

toxolib matrix -i sample1_species.bracken sample2_species.bracken -o abundance_matrix.csv

HPC Usage

Toxolib can run the analysis pipeline on an HPC cluster using SLURM for job scheduling.

1. Set up HPC connection

toxolib hpc-setup --hostname your-hpc-server.edu --username your-username --key-file ~/.ssh/id_rsa

This will save your HPC connection details to ~/.toxolib/hpc_config.yaml.

2. Run the pipeline on HPC

toxolib hpc -r raw_data_1.fastq.gz raw_data_2.fastq.gz -o /path/on/hpc/output_dir \
    --kraken-db /path/on/hpc/kraken2_db \
    --corn-db /path/on/hpc/corn_db

This will:

  1. Upload your raw data files to the HPC
  2. Create a Snakemake workflow file
  3. Submit a SLURM job to run the analysis
  4. Return a job ID for tracking

3. Check job status

toxolib hpc-status --job-id your_job_id

4. Download results when complete

toxolib hpc-download --job-id your_job_id --output-dir ./local_results

Database Setup

Kraken2 Database

You can download the standard Kraken2 database from: https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240112.tar.gz

wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240112.tar.gz
tar -xzf k2_standard_20240112.tar.gz -C /path/to/kraken2/database
export KRAKEN2_DB_DIR=/path/to/kraken2/database

Corn Genome Database

For host removal, you can download the corn genome reference from: https://glwasoilmetagenome.s3.us-east-1.amazonaws.com/corn_db.zip

wget https://glwasoilmetagenome.s3.us-east-1.amazonaws.com/corn_db.zip
unzip corn_db.zip -d /path/to/corn_db

Setting up databases on HPC

When using the HPC functionality, you'll need to upload and extract these databases on your HPC system:

# On your local machine, download the databases
wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240112.tar.gz
wget https://glwasoilmetagenome.s3.us-east-1.amazonaws.com/corn_db.zip

# Upload to HPC (using scp)
scp k2_standard_20240112.tar.gz your-username@your-hpc-server.edu:/path/on/hpc/
scp corn_db.zip your-username@your-hpc-server.edu:/path/on/hpc/

# SSH into HPC and extract
ssh your-username@your-hpc-server.edu
mkdir -p /path/on/hpc/kraken2_db
tar -xzf /path/on/hpc/k2_standard_20240112.tar.gz -C /path/on/hpc/kraken2_db
mkdir -p /path/on/hpc/corn_db
unzip /path/on/hpc/corn_db.zip -d /path/on/hpc/corn_db

Then when running toxolib, specify these paths:

toxolib hpc -r raw_data_1.fastq.gz raw_data_2.fastq.gz -o /path/on/hpc/output_dir \
    --kraken-db /path/on/hpc/kraken2_db \
    --corn-db /path/on/hpc/corn_db

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toxolib-0.1.0.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toxolib-0.1.0-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file toxolib-0.1.0.tar.gz.

File metadata

  • Download URL: toxolib-0.1.0.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for toxolib-0.1.0.tar.gz
Algorithm Hash digest
SHA256 464fd4b582ca168d81712fa3d5d1642d6f8ade6c611960cb728764ceaedcdec4
MD5 327c340c7d911598630a580a1915067e
BLAKE2b-256 7d1ddffbe6e60fa62574f27d1f0aee1ebec5b8fa934f0f5a7c941f3cfbbdc420

See more details on using hashes here.

File details

Details for the file toxolib-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: toxolib-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for toxolib-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 951fdac8a78a8f294fae20c31de7602797a5e21381fd5764f3079202f7fb83aa
MD5 3eecb3e2ddc39db81a251cbe798616d8
BLAKE2b-256 ece5135c28bb89816427e063bf28bf1af03f391b7c8fed68d19dbd0beb0f89f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page