A tool for metagenomic taxonomic profiling and abundance matrix generation

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

toxolib

A Python package for metagenomic taxonomic profiling and abundance matrix generation.

Installation

Using pip

pip install toxolib

Install directly from GitHub

pip install git+https://github.com/dhruvac29/toxolib.git

Using conda

We recommend using conda to install all dependencies. An environment file is included in the package:

# Clone the repository
git clone https://github.com/dhruvac29/toxolib.git
cd toxolib

# Create and activate the conda environment
conda env create -f environment.yml
conda activate taxonomy_env

# Install the package
pip install -e .

Requirements

This package requires the following external tools to be installed and available in your PATH:

Kraken2
Bracken
Krona (for visualization)
fastp (for preprocessing)
bowtie2 (for host removal)
samtools

All these dependencies are included in the conda environment file.

You'll also need to set the environment variable KRAKEN2_DB_DIR to the path of your Kraken2 database:

export KRAKEN2_DB_DIR=/path/to/kraken2/database

Usage

Local Usage

Generate abundance matrix from raw data

toxolib abundance -r raw_data_1.fastq.gz raw_data_2.fastq.gz -o output_directory

This will:

Run Kraken2 on the raw data
Run Bracken on the Kraken2 results
Generate an abundance matrix from the Bracken results

Create abundance matrix from existing Bracken files

toxolib matrix -i sample1_species.bracken sample2_species.bracken -o abundance_matrix.csv

HPC Usage

Toxolib can run the analysis pipeline on an HPC cluster using SLURM for job scheduling.

1. Set up HPC connection

toxolib hpc-setup --hostname your-hpc-server.edu --username your-username --key-file ~/.ssh/id_rsa

This will save your HPC connection details to ~/.toxolib/hpc_config.yaml.

2. Run the pipeline on HPC

toxolib hpc -r raw_data_1.fastq.gz raw_data_2.fastq.gz -o /path/on/hpc/output_dir \
    --kraken-db /path/on/hpc/kraken2_db \
    --corn-db /path/on/hpc/corn_db

This will:

Upload your raw data files to the HPC
Create a Snakemake workflow file
Submit a SLURM job to run the analysis
Return a job ID for tracking

3. Check job status

toxolib hpc-status --job-id your_job_id

4. Download results when complete

toxolib hpc-download --job-id your_job_id --output-dir ./local_results

Database Setup

Kraken2 Database

You can download the standard Kraken2 database from: https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240112.tar.gz

wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240112.tar.gz
tar -xzf k2_standard_20240112.tar.gz -C /path/to/kraken2/database
export KRAKEN2_DB_DIR=/path/to/kraken2/database

Corn Genome Database

For host removal, you can download the corn genome reference from: https://glwasoilmetagenome.s3.us-east-1.amazonaws.com/corn_db.zip

wget https://glwasoilmetagenome.s3.us-east-1.amazonaws.com/corn_db.zip
unzip corn_db.zip -d /path/to/corn_db

Setting up databases on HPC

When using the HPC functionality, you'll need to upload and extract these databases on your HPC system:

# On your local machine, download the databases
wget https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20240112.tar.gz
wget https://glwasoilmetagenome.s3.us-east-1.amazonaws.com/corn_db.zip

# Upload to HPC (using scp)
scp k2_standard_20240112.tar.gz your-username@your-hpc-server.edu:/path/on/hpc/
scp corn_db.zip your-username@your-hpc-server.edu:/path/on/hpc/

# SSH into HPC and extract
ssh your-username@your-hpc-server.edu
mkdir -p /path/on/hpc/kraken2_db
tar -xzf /path/on/hpc/k2_standard_20240112.tar.gz -C /path/on/hpc/kraken2_db
mkdir -p /path/on/hpc/corn_db
unzip /path/on/hpc/corn_db.zip -d /path/on/hpc/corn_db

Then when running toxolib, specify these paths:

toxolib hpc -r raw_data_1.fastq.gz raw_data_2.fastq.gz -o /path/on/hpc/output_dir \
    --kraken-db /path/on/hpc/kraken2_db \
    --corn-db /path/on/hpc/corn_db

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.42

Apr 13, 2025

0.1.41

Apr 13, 2025

0.1.40

Apr 13, 2025

0.1.39

Apr 13, 2025

0.1.38

Apr 13, 2025

0.1.37

Apr 13, 2025

0.1.36

Apr 13, 2025

0.1.35

Apr 13, 2025

0.1.34

Apr 12, 2025

0.1.33

Apr 12, 2025

0.1.32

Apr 12, 2025

0.1.31

Apr 12, 2025

0.1.30

Apr 12, 2025

0.1.29

Apr 12, 2025

0.1.28

Apr 12, 2025

0.1.27

Apr 12, 2025

0.1.26

Apr 12, 2025

0.1.24

Apr 12, 2025

0.1.23

Apr 12, 2025

0.1.22

Apr 12, 2025

0.1.21

Apr 12, 2025

0.1.20

Apr 12, 2025

0.1.19

Apr 12, 2025

0.1.18

Apr 12, 2025

0.1.17

Apr 12, 2025

0.1.16

Apr 12, 2025

0.1.15

Apr 12, 2025

0.1.14

Apr 12, 2025

0.1.12

Apr 12, 2025

0.1.11

Apr 11, 2025

0.1.10

Apr 11, 2025

0.1.9

Apr 11, 2025

0.1.8

Apr 11, 2025

0.1.7

Apr 11, 2025

0.1.6

Apr 10, 2025

0.1.5

Apr 8, 2025

0.1.4

Apr 8, 2025

0.1.3

Apr 8, 2025

0.1.2

Apr 8, 2025

0.1.1

Apr 8, 2025

This version

0.1.0

Apr 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toxolib-0.1.0.tar.gz (17.3 kB view details)

Uploaded Apr 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

toxolib-0.1.0-py3-none-any.whl (17.8 kB view details)

Uploaded Apr 8, 2025 Python 3

File details

Details for the file toxolib-0.1.0.tar.gz.

File metadata

Download URL: toxolib-0.1.0.tar.gz
Upload date: Apr 8, 2025
Size: 17.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for toxolib-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`464fd4b582ca168d81712fa3d5d1642d6f8ade6c611960cb728764ceaedcdec4`
MD5	`327c340c7d911598630a580a1915067e`
BLAKE2b-256	`7d1ddffbe6e60fa62574f27d1f0aee1ebec5b8fa934f0f5a7c941f3cfbbdc420`

See more details on using hashes here.

File details

Details for the file toxolib-0.1.0-py3-none-any.whl.

File metadata

Download URL: toxolib-0.1.0-py3-none-any.whl
Upload date: Apr 8, 2025
Size: 17.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for toxolib-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`951fdac8a78a8f294fae20c31de7602797a5e21381fd5764f3079202f7fb83aa`
MD5	`3eecb3e2ddc39db81a251cbe798616d8`
BLAKE2b-256	`ece5135c28bb89816427e063bf28bf1af03f391b7c8fed68d19dbd0beb0f89f0`

See more details on using hashes here.

toxolib 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

toxolib

Installation

Using pip

Install directly from GitHub

Using conda

Requirements

Usage

Local Usage

Generate abundance matrix from raw data

Create abundance matrix from existing Bracken files

HPC Usage

1. Set up HPC connection

2. Run the pipeline on HPC

3. Check job status

4. Download results when complete

Database Setup

Kraken2 Database

Corn Genome Database

Setting up databases on HPC

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes