A sketch-based surveillance platform
Project description
Mashpit
Create a database of mash signatures and find the most similar genomes to a target sample
Installation
-
Sra-tools for Linux
wget https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.10.8/sratoolkit.2.10.8-centos_linux64.tar.gz -O /tmp/sratoolkit.tar.gz
tar -xvf /tmp/sratoolkit.tar.gz
Add sratoolkit to the environment:
export PATH=$PATH:$PWD/sratoolkit.2.10.8-centos_linux64/bin
-
Mashpit can be downloaded using pip or conda:
pip install mashpit
-
Ngstool is needed to run mashpit on Raspberry Pi
- Build and install ngs. Follow instructions at https://github.com/ncbi/ngs/wiki/Building-and-Installing-from-Source
- Install ncbi-vdb. Follow instructions at https://github.com/ncbi/ncbi-vdb/wiki/Building-and-Installing-from-Source
- Build and install ngstools from source. Follow instructions at https://github.com/ncbi/ngs-tools
Dependencies
- Python 3.7 and 3.8
- Sra-tools 2.10.8
Usage
1. Create the database
usage: mashpit create [-h] database
Create new mashpit database
positional arguments:
database Name for the database.
optional arguments:
-h, --help show this help message and exit
- Example command
mashpit create reading
2. Set up Entrez email and API key
usage: mashpit config [-h] [-k KEY] email
Add Entrez email and key to environment variables
positional arguments:
email Entrez email address
optional arguments:
-h, --help show this help message and exit
-k KEY, --key KEY Entrez api key
- Example command
mashpit config email@email.com -k p@$$word
More information about Entrez API key can be found on this page.
3. Collect the metadata
usage: mashpit metadata [-h] [-l LIST] [-t TERM]
database {bioproject_list,biosample_list,keyword}
Collect metadata from NCBI based on bioproject/biosample accession or keywords
positional arguments:
database Name of the database
{bioproject_list,biosample_list,keyword}
Metadata collecting method. Available options:
bioproject_list, biosample_list, keyword
optional arguments:
-h, --help show this help message and exit
-l LIST, --list LIST File name of a list of bioproject or biosample
-t TERM, --term TERM Query keyword
- Example command
- Using BioProject list
mashpit metadata reading bioproject_list -l list_file
- Using BioSample list
mashpit metadata reading biosample_list -l list_file
- Using keyword
mashpit metadata reading keyword -t salmonella_reading
4. Get the assembly and the signature file for all the entries in the database
usage: mashpit sketch [-h] [-n NUMBER] database
Build sketches for the records in the database
positional arguments:
database Name of the database
optional arguments:
-h, --help show this help message and exit
-n NUMBER, --number NUMBER
Number of genomes in a batch to be downloaded and
sketched. Default is 1000.
- Example command
mashpit sketch reading
5.Split the large signature file into separate ones to speed up the query
usage: mashpit split [-h] [-n NUMBER] database
Split large signature file to speed up the query
positional arguments:
database Name of the database
optional arguments:
-h, --help show this help message and exit
-n NUMBER, --number NUMBER
Number of files to be splited into. Default is 16.
- Example command
mashpit split reading -n 16
6. Query in the database
usage: mashpit query [-h] [-n NUMBER] [-f] sample database
Find the most similar assemblies to the target sample
positional arguments:
sample target sample file path
database name of the database
optional arguments:
-h, --help show this help message and exit
-n NUMBER, --number NUMBER
number of separated signature file
-f, --force overwrite query record if query table exists
- Example command
mashpit query sample_file reading -n 16
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
mashpit-0.8.1.tar.gz
(10.3 kB
view details)
Built Distribution
mashpit-0.8.1-py3-none-any.whl
(19.2 kB
view details)
File details
Details for the file mashpit-0.8.1.tar.gz
.
File metadata
- Download URL: mashpit-0.8.1.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 04b1225dffdc7e2f47d8d29fa49044ab908c9ecb6dcd824d027162fed1c1b6d0 |
|
MD5 | 9a300d2f67b30e0dad71c10f97c1c349 |
|
BLAKE2b-256 | e1862ceb7cb34f9eb876d8bf1940a187aebeb636c2f27e0130a639cff5d853b8 |
File details
Details for the file mashpit-0.8.1-py3-none-any.whl
.
File metadata
- Download URL: mashpit-0.8.1-py3-none-any.whl
- Upload date:
- Size: 19.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 783f57533de2e440b1621189cb0e073ae900d4563133249e72cfb687f23db2b6 |
|
MD5 | 30fc109acfb4c62ab39d957c3f5e3037 |
|
BLAKE2b-256 | 2e169fefb4f1a0cc18ea4485785cb90bb6468da762899ec8ad3b247f4ecdb54c |