Skip to main content

A sketch-based surveillance platform

Project description

Build Status PyPI release

Mashpit

Create a database of mash signatures and find the most similar genomes to a target sample

Dependencies

  • Python >= 3.8
  • NCBI datasets

Installation

Install NCBI datasets

curl -o datasets 'https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/v2/linux-amd64/datasets'
chmod +x datasets
export PATH=$PATH:$PWD

Install mashpit using pip:

pip install mashpit

Or git clone from github:

git clone https://github.com/tongzhouxu/mashpit.git
cd mashpit
pip install . 

Mashpit Database

A mashpit database is a directory containing:

  • $DB_NAME.db
  • $DB_NAME.sig

Mashpit database can be built using:

  1. A taxonomic name A standard database is a collection of representative genomes from each cluster on Pathogen Detection. By default mashpit will download the latest version of a specified species and find the centroid of each SNP cluter (SNP tree).
  2. BioSample accessions
    A custom database is a collection of genomes based on a proveded biosample accesion list.

Usage

1. Build a mashpit database

usage: mashpit build [-h] [--quiet] [--number NUMBER] [--ksize KSIZE] [--species SPECIES] [--email EMAIL] [--key KEY] [--pd_version PD_VERSION] [--list LIST] {taxon,accession} name

positional arguments:
  {taxon,accession}     mashpit database type.
  name                  mashpit database name

optional arguments:
  -h, --help            show this help message and exit
  --quiet               disable logs
  --number NUMBER       maximum number of hashes for sourmash, default is 1000
  --ksize KSIZE         kmer size for sourmash, default is 31
  --species SPECIES     species name
  --email EMAIL         Entrez email
  --key KEY             Entrez api key
  --pd_version PD_VERSION
                        a specified Pathogen Detection version (PDG accession). Default is the latest.
  --list LIST           Path to a list of NCBI BioSample accessions
  • Example command
mashpit build standard salmonella -s Salmonella

Note: Supported species names can be found in this list

2. Query against a mashpit database

usage: mashpit query [-h] [--number NUMBER] [--threshold THRESHOLD] [--annotation ANNOTATION] sample database

positional arguments:
  sample                path to query sample
  database              path to the database folder

optional arguments:
  -h, --help            show this help message and exit
  --number NUMBER       number of isolates in the query output, default is 200
  --threshold THRESHOLD
                        minimum jaccard similarity for mashtree, default is 0.85
  --annotation ANNOTATION
                        mashtree tip annoatation, default is none
  • Example command
mashpit query sample.fasta path/to/database

Optional: Update the database

usage: mashpit update [-h] [--metadata METADATA] [--quiet] database name

positional arguments:
  database             path for the database folder
  name                 database name

optional arguments:
  -h, --help           show this help message and exit
  --metadata METADATA  metadata file in csv format
  --quiet              disable logs
  • Example command
mashpit update path/to/database salmonella

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mashpit-0.9.3.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

mashpit-0.9.3-py3-none-any.whl (1.6 MB view details)

Uploaded Python 3

File details

Details for the file mashpit-0.9.3.tar.gz.

File metadata

  • Download URL: mashpit-0.9.3.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.13

File hashes

Hashes for mashpit-0.9.3.tar.gz
Algorithm Hash digest
SHA256 893385011e620071e89046a7c3543b09acbf08eef27f0567e04d6aa88bfaa523
MD5 97c16efa81a6de30e00b9d3e53b077eb
BLAKE2b-256 d2b55ea7cda83c2f106f8c86a66426bf8a8d760b39e0e74e829bd7c11b51a9e6

See more details on using hashes here.

File details

Details for the file mashpit-0.9.3-py3-none-any.whl.

File metadata

  • Download URL: mashpit-0.9.3-py3-none-any.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.13

File hashes

Hashes for mashpit-0.9.3-py3-none-any.whl
Algorithm Hash digest
SHA256 db4a5ab4509a8d5b95ff495c9477fb748f92e6455c305b6338fde80b20f69dbb
MD5 25e51295903f0c3a5650ccf8fe2f2d96
BLAKE2b-256 e8d83ac8312132687e2dccc1ffb1069dc6ebd512e842baa862b7bb5557250f79

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page