Skip to main content

A sketch-based surveillance platform

Project description

Mashpit

unittest License: GPL v2 PyPI release

Create a database of mash signatures and find the most similar genomes to a target sample

Dependencies

  • Python >= 3.8
  • NCBI datasets

Installation

Install NCBI datasets

curl -o datasets 'https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/v2/linux-amd64/datasets'
chmod +x datasets
export PATH=$PATH:$PWD

Install mashpit using pip:

pip install mashpit

Or git clone from github:

git clone https://github.com/tongzhouxu/mashpit.git
cd mashpit
pip install . 

Mashpit Database

A mashpit database is a directory containing:

  • $DB_NAME.db
  • $DB_NAME.sig

Mashpit database can be built using:

  1. A taxonomic name A standard database is a collection of representative genomes from each cluster on Pathogen Detection. By default mashpit will download the latest version of a specified species and find the centroid of each SNP cluter (SNP tree).
  2. BioSample accessions
    A custom database is a collection of genomes based on a proveded biosample accesion list.

Usage

1. Build a mashpit database

usage: mashpit build [-h] [--quiet] [--number NUMBER] [--ksize KSIZE] [--species SPECIES] [--email EMAIL] [--key KEY] [--pd_version PD_VERSION] [--list LIST] {taxon,accession} name

positional arguments:
  {taxon,accession}     mashpit database type.
  name                  mashpit database name

optional arguments:
  -h, --help            show this help message and exit
  --quiet               disable logs
  --number NUMBER       maximum number of hashes for sourmash, default is 1000
  --ksize KSIZE         kmer size for sourmash, default is 31
  --species SPECIES     species name
  --email EMAIL         Entrez email
  --key KEY             Entrez api key
  --pd_version PD_VERSION
                        a specified Pathogen Detection version (PDG accession). Default is the latest.
  --list LIST           Path to a list of NCBI BioSample accessions
  • Example command
mashpit build taxon salmonella -s Salmonella

Note: Supported species names can be found in this list

2. Query against a mashpit database

usage: mashpit query [-h] [--number NUMBER] [--threshold THRESHOLD] [--annotation ANNOTATION] sample database

positional arguments:
  sample                path to query sample
  database              path to the database folder

optional arguments:
  -h, --help            show this help message and exit
  --number NUMBER       number of isolates in the query output, default is 200
  --threshold THRESHOLD
                        minimum jaccard similarity for mashtree, default is 0.85
  --annotation ANNOTATION
                        mashtree tip annoatation, default is none
  • Example command
mashpit query sample.fasta path/to/database

Optional: Update the database

usage: mashpit update [-h] [--metadata METADATA] [--quiet] database name

positional arguments:
  database             path for the database folder
  name                 database name

optional arguments:
  -h, --help           show this help message and exit
  --metadata METADATA  metadata file in csv format
  --quiet              disable logs
  • Example command
mashpit update path/to/database salmonella

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mashpit-0.9.6.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

mashpit-0.9.6-py3-none-any.whl (1.6 MB view details)

Uploaded Python 3

File details

Details for the file mashpit-0.9.6.tar.gz.

File metadata

  • Download URL: mashpit-0.9.6.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.13

File hashes

Hashes for mashpit-0.9.6.tar.gz
Algorithm Hash digest
SHA256 944609fa3c3ebe8bb77eb474d62884624f343296dc407751328a25f95cd12889
MD5 9148032734269cc649aa1134948c717c
BLAKE2b-256 2a0e9654664a919c9f99e7e93a3bdba80704f4f796c582c983ed000143963873

See more details on using hashes here.

File details

Details for the file mashpit-0.9.6-py3-none-any.whl.

File metadata

  • Download URL: mashpit-0.9.6-py3-none-any.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.13

File hashes

Hashes for mashpit-0.9.6-py3-none-any.whl
Algorithm Hash digest
SHA256 dddbdb74cd818f93f0f9a70aa135ea4b13d135dadc044fdded521d3d65736492
MD5 26d7e36b957153e4489853544029085a
BLAKE2b-256 3fbb30ee5847a9b3277b40e2353543bed66551036cc60ab73ab300615c6f75b0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page