A sketch-based surveillance platform
Project description
Mashpit
Create a database of mash signatures and find the most similar genomes to a target sample
Dependencies
- Python >= 3.8
- NCBI datasets
Installation
Install NCBI datasets
curl -o datasets 'https://ftp.ncbi.nlm.nih.gov/pub/datasets/command-line/v2/linux-amd64/datasets'
chmod +x datasets
export PATH=$PATH:$PWD
Install mashpit using pip:
pip install mashpit
Or git clone from github:
git clone https://github.com/tongzhouxu/mashpit.git
cd mashpit
pip install .
Mashpit Database
A mashpit database is a directory containing:
$DB_NAME.db
$DB_NAME.sig
Mashpit database can be built using:
- A taxonomic name A standard database is a collection of representative genomes from each cluster on Pathogen Detection. By default mashpit will download the latest version of a specified species and find the centroid of each SNP cluter (SNP tree).
- BioSample accessions
A custom database is a collection of genomes based on a proveded biosample accesion list.
Usage
1. Build a mashpit database
usage: mashpit build [-h] [--quiet] [--number NUMBER] [--ksize KSIZE] [--species SPECIES] [--email EMAIL] [--key KEY] [--pd_version PD_VERSION] [--list LIST] {taxon,accession} name
positional arguments:
{taxon,accession} mashpit database type.
name mashpit database name
optional arguments:
-h, --help show this help message and exit
--quiet disable logs
--number NUMBER maximum number of hashes for sourmash, default is 1000
--ksize KSIZE kmer size for sourmash, default is 31
--species SPECIES species name
--email EMAIL Entrez email
--key KEY Entrez api key
--pd_version PD_VERSION
a specified Pathogen Detection version (PDG accession). Default is the latest.
--list LIST Path to a list of NCBI BioSample accessions
- Example command
mashpit build taxon salmonella -s Salmonella
Note: Supported species names can be found in this list
2. Query against a mashpit database
usage: mashpit query [-h] [--number NUMBER] [--threshold THRESHOLD] [--annotation ANNOTATION] sample database
positional arguments:
sample path to query sample
database path to the database folder
optional arguments:
-h, --help show this help message and exit
--number NUMBER number of isolates in the query output, default is 200
--threshold THRESHOLD
minimum jaccard similarity for mashtree, default is 0.85
--annotation ANNOTATION
mashtree tip annoatation, default is none
- Example command
mashpit query sample.fasta path/to/database
Optional: Update the database
usage: mashpit update [-h] [--metadata METADATA] [--quiet] database name
positional arguments:
database path for the database folder
name database name
optional arguments:
-h, --help show this help message and exit
--metadata METADATA metadata file in csv format
--quiet disable logs
- Example command
mashpit update path/to/database salmonella
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
mashpit-0.9.6.tar.gz
(1.5 MB
view details)
Built Distribution
File details
Details for the file mashpit-0.9.6.tar.gz
.
File metadata
- Download URL: mashpit-0.9.6.tar.gz
- Upload date:
- Size: 1.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 944609fa3c3ebe8bb77eb474d62884624f343296dc407751328a25f95cd12889 |
|
MD5 | 9148032734269cc649aa1134948c717c |
|
BLAKE2b-256 | 2a0e9654664a919c9f99e7e93a3bdba80704f4f796c582c983ed000143963873 |
File details
Details for the file mashpit-0.9.6-py3-none-any.whl
.
File metadata
- Download URL: mashpit-0.9.6-py3-none-any.whl
- Upload date:
- Size: 1.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dddbdb74cd818f93f0f9a70aa135ea4b13d135dadc044fdded521d3d65736492 |
|
MD5 | 26d7e36b957153e4489853544029085a |
|
BLAKE2b-256 | 3fbb30ee5847a9b3277b40e2353543bed66551036cc60ab73ab300615c6f75b0 |