Skip to main content

A sketch-based surveillance platform

Project description

Build Status PyPI release

Mashpit

Create a database of mash signatures and find the most similar genomes to a target sample

Installation

Install sratoolkit for Linux

wget https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.10.8/sratoolkit.2.10.8-centos_linux64.tar.gz -O /tmp/sratoolkit.tar.gz
tar -xvf /tmp/sratoolkit.tar.gz
export PATH=$PATH:$PWD/sratoolkit.2.10.8-centos_linux64/bin

Install mashpit using pip:

pip install mashpit

Or manually install after git clone:

git clone https://github.com/tongzhouxu/mashpit.git
python setup.py install 

Ngstool is needed to build mashpit database on Raspberry Pi:

  1. Build and install ngs. Follow instructions at https://github.com/ncbi/ngs/wiki/Building-and-Installing-from-Source
  2. Install ncbi-vdb. Follow instructions at https://github.com/ncbi/ncbi-vdb/wiki/Building-and-Installing-from-Source
  3. Build and install ngstools from source. Follow instructions at https://github.com/ncbi/ngs-tools

Dependencies

  • Python >= 3.8
  • Sratoolkit 2.10.8

Mashpit Database

A mashpit database is a directory containing:

  • $DB_NAME.db
  • $DB_NAME.sig

Two types of Mashpit database:

  1. Standard Database (Pathogen Detection Database)

    A standard database is a collection of representative genomes from each cluster on Pathogen Detection. By default mashpit will download the latest version of a specified species and find the centroid of each SNP cluter (SNP tree).

  2. Custom Database

    A custom database is a collection of genomes based on a proveded biosample accesion list or a keyword.

Usage

0. Set up Entrez email and API key (For custom database only)

usage: mashpit config [-h] [-k KEY] email

Add Entrez email and key to environment variables

positional arguments:
  email              Entrez email address

optional arguments:
  -h, --help         show this help message and exit
  -k KEY, --key KEY  Entrez api key
  • Example command
mashpit config example@example.com -k your_password

More information about Entrez API key can be found on this page.

1. Build the database

usage: mashpit build [-h] [-s SPECIES] [-l LIST] [-t TERM]
                     {standard,biosample_list,keyword} name

Build mashpit database

positional arguments:
  {standard,biosample_list,keyword}
                        Database type: standard or custom
  name                  Mashpit database name

optional arguments:
  -h, --help            show this help message and exit
  -s SPECIES, --species SPECIES
                        Query keyword
  -l LIST, --list LIST  File containing a list of biosample accessions
  -t TERM, --term TERM  Query keyword
  • Example command
mashpit build standard salmonella -s Salmonella
mashpit build biosample_list db_name -l file.list
mashpit build keyword db_name -t salmonella_enteritidis

2. Sketch the genomes

usage: mashpit sketch [-h] name

Build sketches for the records in the database

positional arguments:
  name        Mashpit database name

optional arguments:
  -h, --help  show this help message and exit
  • Example command
mashpit sketch db_name

3. Run a query

usage: mashpit query [-h] [-n NUMBER] sample database

Find the most similar assemblies to the target sample

positional arguments:
  sample                file name of the query sample
  database              name of the database

optional arguments:
  -h, --help            show this help message and exit
  • Example command
mashpit query sample.fasta db_name

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mashpit-0.9.2.tar.gz (16.8 kB view hashes)

Uploaded Source

Built Distribution

mashpit-0.9.2-py3-none-any.whl (18.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page