Skip to main content

A sketch-based surveillance platform

Project description

Build Status PyPI release

Mashpit

Create a database of mash signatures and find the most similar genomes to a target sample

Installation

Dependencies

  • Python 3.7 and 3.8
  • Sra-tools 2.10.8

Usage

1. Create the database

usage: mashpit create [-h] database

Create new mashpit database

positional arguments:
  database    Name for the database.

optional arguments:
  -h, --help  show this help message and exit
  • Example command
mashpit create reading

2. Set up Entrez email and API key

usage: mashpit config [-h] [-k KEY] email

Add Entrez email and key to environment variables

positional arguments:
  email              Entrez email address

optional arguments:
  -h, --help         show this help message and exit
  -k KEY, --key KEY  Entrez api key
  • Example command
mashpit config email@email.com -k p@$$word

More information about Entrez API key can be found on this page.

3. Collect the metadata

usage: mashpit metadata [-h] [-l LIST] [-t TERM]
                        database {bioproject_list,biosample_list,keyword}

Collect metadata from NCBI based on bioproject/biosample accession or keywords

positional arguments:
  database              Name of the database
  {bioproject_list,biosample_list,keyword}
                        Metadata collecting method. Available options:
                        bioproject_list, biosample_list, keyword

optional arguments:
  -h, --help            show this help message and exit
  -l LIST, --list LIST  File name of a list of bioproject or biosample
  -t TERM, --term TERM  Query keyword
  • Example command
    • Using BioProject list
    mashpit metadata reading bioproject_list -l list_file
    
    • Using BioSample list
    mashpit metadata reading biosample_list -l list_file
    
    • Using keyword
    mashpit metadata reading keyword -t salmonella_reading
    

4. Get the assembly and the signature file for all the entries in the database

usage: mashpit sketch [-h] [-n NUMBER] database

Build sketches for the records in the database

positional arguments:
  database              Name of the database

optional arguments:
  -h, --help            show this help message and exit
  -n NUMBER, --number NUMBER
                        Number of genomes in a batch to be downloaded and
                        sketched. Default is 1000.
  • Example command
mashpit sketch reading

5.Split the large signature file into separate ones to speed up the query

usage: mashpit split [-h] [-n NUMBER] database

Split large signature file to speed up the query

positional arguments:
  database              Name of the database

optional arguments:
  -h, --help            show this help message and exit
  -n NUMBER, --number NUMBER
                        Number of files to be splited into. Default is 16.
  • Example command
mashpit split reading -n 16

6. Query in the database

usage: mashpit query [-h] [-n NUMBER] [-f] sample database

Find the most similar assemblies to the target sample

positional arguments:
  sample                target sample file path
  database              name of the database

optional arguments:
  -h, --help            show this help message and exit
  -n NUMBER, --number NUMBER
                        number of separated signature file
  -f, --force           overwrite query record if query table exists
  • Example command
mashpit query sample_file reading -n 16

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mashpit-0.8.1.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

mashpit-0.8.1-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file mashpit-0.8.1.tar.gz.

File metadata

  • Download URL: mashpit-0.8.1.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.9

File hashes

Hashes for mashpit-0.8.1.tar.gz
Algorithm Hash digest
SHA256 04b1225dffdc7e2f47d8d29fa49044ab908c9ecb6dcd824d027162fed1c1b6d0
MD5 9a300d2f67b30e0dad71c10f97c1c349
BLAKE2b-256 e1862ceb7cb34f9eb876d8bf1940a187aebeb636c2f27e0130a639cff5d853b8

See more details on using hashes here.

File details

Details for the file mashpit-0.8.1-py3-none-any.whl.

File metadata

  • Download URL: mashpit-0.8.1-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.9

File hashes

Hashes for mashpit-0.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 783f57533de2e440b1621189cb0e073ae900d4563133249e72cfb687f23db2b6
MD5 30fc109acfb4c62ab39d957c3f5e3037
BLAKE2b-256 2e169fefb4f1a0cc18ea4485785cb90bb6468da762899ec8ad3b247f4ecdb54c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page