Skip to main content

A Python package to update and manage the MLST database for the MLST tool.

Project description

mlstdb

Tests GitHub release (latest by date) PyPI - Version PyPI - Python Version Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge

mlstdb is a Python package to update and manage the MLST database for the mlst tool using the PubMLST and BIGSdb Pasteur APIs. It is written to handle the OAuth2 authentication process that's required to access up-to-date MLST schemes available on these databases. This tool allows user to fetch MLST schemes, filter the schemes, and update the MLST database for the mlst tool.


Table of Contents

Prerequisites

Should install mlst for the use of this tool.

Installation

Recommended installation method:

First, create a conda environment with mlst installed:

conda create -n mlst -c bioconda mlst
conda activate mlst

Then install mlstdb using pip:

pip install mlstdb

Alternative installation methods:

From bioconda (note: include conda-forge channel to resolve dependencies):

conda install -c conda-forge -c bioconda mlstdb

Or install both tools together:

conda create -n mlst -c conda-forge -c bioconda mlst mlstdb

From PyPI only:

pip install mlstdb

Note: If you encounter dependency errors when installing from bioconda (e.g., nothing provides rauth >=0.7.3), ensure you include the -c conda-forge channel in your installation command, or use the recommended pip installation method instead.

⚠️ Disclaimer / Caution

Please read before using mlstdb:

  • Backup your original MLST databases before running any updates to avoid accidental overwrites or deletions.

  • Do not blindly update all the schemes obtained from mlstdb fetch. Not all downloaded schemes are suitable or validated for the mlst tool.

  • Carefully curate your list of schemes before running mlstdb update. Overwriting core MLST data with unverified schemes may cause downstream issues with tools like mlst.

Usage

mlstdb uses a simple two step process to update the MLST database for the mlst tool. It has two main subcommands: fetch and update.

  1. Fetch MLST schemes
mlstdb fetch --help
Usage: mlstdb fetch [OPTIONS]

  BIGSdb Scheme Fetcher Tool

  This tool downloads MLST scheme information from BIGSdb databases. It will
  automatically handle authentication and save the results.

Options:
  -h, --help                  Show this message and exit.
  -d, --db [pubmlst|pasteur]  Database to use (pubmlst or pasteur)
  -e, --exclude TEXT          Scheme name must not include provided term
                              (default: cgMLST)
  -m, --match TEXT            Scheme name must include provided term (default:
                              MLST)
  -s, --scheme-uris TEXT      Optional: Path to custom scheme_uris.tab file
  -f, --filter TEXT           Filter species or schemes using a wildcard
                              pattern
  -r, --resume                Resume processing from where it stopped
  -v, --verbose               Enable verbose logging for debugging

Use the fetch command to download MLST schemes from the BIGSdb databases. The --db argument specifies the database to use, which can be either pubmlst or pasteur. The --exclude and --match arguments can be used to filter the schemes based on the scheme name. The --scheme-uris argument can be used to provide a custom scheme URIs file. The --filter argument can be used to filter species or schemes using a wildcard pattern. The --resume flag can be used to resume processing from where it stopped. The --verbose flag can be used to enable verbose logging for debugging. This will create a mlst_schemes_<db>.txt file with the MLST schemes.

We can just use mlstdb fetch to download the MLST schemes from the BIGSdb databases. The command will prompt for the db (either pubmlst or pasteur) to fetch. If the registration is not done, it will prompt the user to register the client credentials. This will save the client credentials to the ~/.config/mlstdb directory.

In cases where the tool does not find an appropriate scheme name, it will prompt the user to either set the missing schemes as 'missing' or auto-generate them. The user can choose the appropriate option as they are prompted.

Auto extraction of scheme?🤔

First, the script automatically tries to extract the scheme names from the dbases.sh file. If the scheme name is not found, it will prompt the user to either print missing in the output file or automatically create a scheme name based on the URL. For eg, for URL https://rest.pubmlst.org/db/pubmlst_borrelia_seqdef/schemes/1, the scheme name will be borrelia. If there are multiple schemes, it will append a number to the scheme name. For eg, for URLs https://rest.pubmlst.org/db/pubmlst_chlamydiales_seqdef/schemes/38 and https://rest.pubmlst.org/db/pubmlst_chlamydiales_seqdef/schemes/41, the scheme names will be chlamydiales_38 and chlamydiales_41 respectively.

The script offers feature to filter for particular species/schemes. It is recommended to run with filter option and thus, download only the required schemes so as not to tamper with the existing DBs and schemes.

📝Important: mlst tool is designed for typing bacterial species only. Please make sure to filter the non-bacterial schemes from your schemes file.

  1. Update MLST database
mlstdb update --help
Usage: mlstdb update [OPTIONS]

  Update MLST schemes and create BLAST database.

  Downloads MLST schemes from the specified input file and creates a BLAST
  database from the downloaded sequences. Authentication tokens should be set
  up using fetch.py.

Options:
  -h, --help                  Show this message and exit.
  -i, --input TEXT            Path to mlst_schemes_<db>.tab containing MLST
                              scheme URLs  [required]
  -d, --directory TEXT        Directory to save the downloaded MLST schemes
                              (default: pubmlst)
  -b, --blast-directory TEXT  Directory for BLAST database (default: blast)
  -v, --verbose               Enable verbose logging for debugging

Use the update command to update the MLST database and create a BLAST database. The --input argument specifies the path to the mlst_schemes_<db>.tab file containing MLST scheme URLs. The --directory argument specifies the directory to save the downloaded MLST schemes. The --blast-directory argument specifies the directory for the BLAST database. The --verbose flag can be used to enable verbose logging for debugging.

We can prepare a custom mlst_schemes_<db>.tab file with headers database species scheme_description scheme URI and use mlstdb update to update the MLST database for select species and schemes. This will automatically create a BLAST database from the downloaded sequences.

Final Steps

After running all scripts, verify the database setup by running the mlst tool with the updated database:

mlst --blastdb <path_to_blast/mlst.fa> --datadir <path_to_pubmlst_dir>

Acknowledgements

This tool was inspired by and builds upon the work of:

  • BIGSdb_downloader by Keith Jolley - The original OAuth-based downloader for BIGSdb databases
  • pyMLST - Python implementation for MLST with database management

License

mlstdb was previously licensed under MIT. As of version 0.1.7, it is licensed under GPL v3. Original MIT‑licensed code is preserved and attributed according to MIT terms.

For additional support, please raise an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlstdb-0.2.0.tar.gz (30.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlstdb-0.2.0-py3-none-any.whl (35.7 kB view details)

Uploaded Python 3

File details

Details for the file mlstdb-0.2.0.tar.gz.

File metadata

  • Download URL: mlstdb-0.2.0.tar.gz
  • Upload date:
  • Size: 30.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.2 cpython/3.14.2 HTTPX/0.28.1

File hashes

Hashes for mlstdb-0.2.0.tar.gz
Algorithm Hash digest
SHA256 68227d522ff086d86f43620c44d087b9ade12c9adb418c1b57169de0c9c13edb
MD5 72276dceca45771e303d50628e96f349
BLAKE2b-256 0c6f61ea4a6f320a62056df81c52ada6268a037086ffbc3c1cb936b561c998b9

See more details on using hashes here.

File details

Details for the file mlstdb-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: mlstdb-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 35.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.2 cpython/3.14.2 HTTPX/0.28.1

File hashes

Hashes for mlstdb-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bcdb3e12851ca25ef2a98e804b8256c7b5229600e873b1e9d62ad5dd9b795db8
MD5 20ecf198cfcb18ec27ccdc9ea3203396
BLAKE2b-256 24080dff95259fc7d8e704daa9e71acae797ba50b811f0165de2e34240d2d433

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page