Skip to main content

Classification through yAML Heuristic Mapping Protocol

Project description

camlhmp

๐Ÿช camlhmp ๐Ÿช - Classification through yAML Heuristic Mapping Protocol

camlhmp is a tool for generating organism typing tools from YAML schemas. Through discussions with Tim Read, we identified a need for a straightforward method to define and manage typing schemas for organisms of interest. YAML was chosen for its simplicity and readability.

Full documentation for camlhmp can be found at https://rpetit3.github.io/camlhmp/.

Purpose

The primary purpose of camlhmp is to provide a framework that enables researchers to independently define typing schemas for their organisms of interest using YAML. This approach facilitates the management and analysis biological data for researchers at any level of experience.

camlhmp does not supply pre-defined typing schemas. Instead, it equips researchers with the necessary tools to create and maintain their own schemas, ensuring these schemas can easily remain up to date with the latest scientific developments.

Finally, the development of camlhmp was driven by a practical need to streamline maintenance of multiple organism typing tools. Managing these tools separately is time-consuming and challenging. camlhmp simplifies this by providing a single framework for each tool.

Quick Start

To quickly get started with camlhmp, you can install it through Bioconda and run the command-line interface:

# Install camlhmp through Bioconda
conda create -n camlhmp -c conda-forge -c bioconda camlhmp
conda activate camlhmp
camlhmp --help

# Example usage of camlhmp-blast-alleles
# Acquire test data
wget https://raw.githubusercontent.com/rpetit3/camlhmp/refs/heads/main/tests/data/blast/alleles/spn-pbptype.yaml
wget https://raw.githubusercontent.com/rpetit3/camlhmp/refs/heads/main/tests/data/blast/alleles/spn-pbptype.fasta
wget https://github.com/rpetit3/camlhmp/raw/refs/heads/main/tests/data/blast/alleles/SRR2912551.fna.gz

# Run camlhmp-blast-alleles
camlhmp-blast-alleles \
    --yaml spn-pbptype.yaml \
    --targets spn-pbptype.fasta \
    --input SRR2912551.fna.gz

Running camlhmp-blast-alleless with following parameters:
    --input SRR2912551.fna.gz
    --yaml spn-pbptype.yaml
    --targets spn-pbptype.fasta
    --outdir ./
    --prefix camlhmp
    --min-pident 95
    --min-coverage 95

Starting camlhmp for S. pneumoniae PBP typing...
Running tblastn...
Processing hits...
Final Results...
                               S. pneumoniae PBP typing
โ”โ”โ”โ”โ”ณโ”โ”โ”โ”ณโ”โ”โ”โ”ณโ”โ”โ”โ”ณโ”โ”โ”โ”ณโ”โ”โ”โ”ณโ”โ”โ”โ”ณโ”โ”โ”โ”ณโ”โ”โ”โ”ณโ”โ”โ”โ”โ”ณโ”โ”โ”โ”ณโ”โ”โ”โ”โ”ณโ”โ”โ”โ”ณโ”โ”โ”โ”โ”ณโ”โ”โ”โ”ณโ”โ”โ”โ”โ”ณโ”โ”โ”โ”ณโ”โ”โ”โ”โ”ณโ”โ”โ”โ”ณโ”โ”โ”โ”โ”“
โ”ƒ โ€ฆ โ”ƒ โ€ฆ โ”ƒ โ€ฆ โ”ƒ โ€ฆ โ”ƒ โ€ฆ โ”ƒ โ€ฆ โ”ƒ โ€ฆ โ”ƒ โ€ฆ โ”ƒ โ€ฆ โ”ƒ 1โ€ฆ โ”ƒ โ€ฆ โ”ƒ 2โ€ฆ โ”ƒ โ€ฆ โ”ƒ 2โ€ฆ โ”ƒ โ€ฆ โ”ƒ 2โ€ฆ โ”ƒ โ€ฆ โ”ƒ 2โ€ฆ โ”ƒ โ€ฆ โ”ƒ 2โ€ฆ โ”ƒ
โ”กโ”โ”โ”โ•‡โ”โ”โ”โ•‡โ”โ”โ”โ•‡โ”โ”โ”โ•‡โ”โ”โ”โ•‡โ”โ”โ”โ•‡โ”โ”โ”โ•‡โ”โ”โ”โ•‡โ”โ”โ”โ•‡โ”โ”โ”โ”โ•‡โ”โ”โ”โ•‡โ”โ”โ”โ”โ•‡โ”โ”โ”โ•‡โ”โ”โ”โ”โ•‡โ”โ”โ”โ•‡โ”โ”โ”โ”โ•‡โ”โ”โ”โ•‡โ”โ”โ”โ”โ•‡โ”โ”โ”โ•‡โ”โ”โ”โ”โ”ฉ
โ”‚ โ€ฆ โ”‚ โ€ฆ โ”‚ โ€ฆ โ”‚ โ€ฆ โ”‚ โ€ฆ โ”‚ โ€ฆ โ”‚ โ€ฆ โ”‚ โ€ฆ โ”‚ โ€ฆ โ”‚    โ”‚ 0 โ”‚ 1โ€ฆ โ”‚ โ€ฆ โ”‚ 5โ€ฆ โ”‚   โ”‚ 2  โ”‚ โ€ฆ โ”‚ 1โ€ฆ โ”‚ โ€ฆ โ”‚    โ”‚
โ””โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”˜
Writing outputs...
Final predicted type written to ./camlhmp.tsv
tblastn results written to ./camlhmp.tblastn.tsv

For more example commands and outputs, see the documentation for each command:

Installation

camlhmp is available through PyPI and Bioconda. While you can install it through PyPi, it is recommended to install it through BioConda so that non-Python dependencies are also installed.

System Requirements

camlhmp has been developed and tested on x86-64 Linux and macOS systems.

OS Architecture Supported?
Linux x86-64 โœ…
Linux aarch64 โŒ (missing dependencies)
macOS x86-64 โœ…
macOS arm64 โŒ (missing dependencies)
Windows x86-64 โŒ _(consider using WSL2) _

[!TIP] Docker containers are available from biocontainers/camlhmp which can be used with the --platform flag to run on Apple Silicon and ARM-based Linux systems.

Dependencies

camlhmp relies on the following dependencies:

dependencies:
  python:
    - biopython >=1.83
    - pyyaml >=6.0.1
    - executor >=23.2
    - rich >=13.7.1,<14
    - rich-click >=1.6.0
  non_python:
    - blast >=2.15.0
    - pigz

Bioconda Installation

conda create -n camlhmp -c conda-forge -c bioconda camlhmp
conda activate camlhmp
camlhmp
๐Ÿช camlhmp ๐Ÿช - Classification through YAML Heuristic Mapping Protocol

Available camlhmp commands
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ command               โ”ƒ description                                                          โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ camlhmp-blast-alleles โ”‚ Classify assemblies using BLAST against alleles of a set of genes    โ”‚
โ”‚ camlhmp-blast-regions โ”‚ Classify assemblies using BLAST against larger genomic regions       โ”‚
โ”‚ camlhmp-blast-targets โ”‚ Classify assemblies using BLAST against individual genes or proteins โ”‚
โ”‚ camlhmp-extract       โ”‚ Extract typing targets from a set of reference sequences             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

PyPi Installation

To install camlhmp through PyPi, you can can use pip:

pip install camlhmp
camlhmp
๐Ÿช camlhmp ๐Ÿช - Classification through YAML Heuristic Mapping Protocol

Available camlhmp commands
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ command               โ”ƒ description                                                          โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ camlhmp-blast-alleles โ”‚ Classify assemblies using BLAST against alleles of a set of genes    โ”‚
โ”‚ camlhmp-blast-regions โ”‚ Classify assemblies using BLAST against larger genomic regions       โ”‚
โ”‚ camlhmp-blast-targets โ”‚ Classify assemblies using BLAST against individual genes or proteins โ”‚
โ”‚ camlhmp-extract       โ”‚ Extract typing targets from a set of reference sequences             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

[!WARNING] Installing through PyPi will not install non-Python dependencies. You will need to ensure these are installed manually.

Citing camlhmp

If you make use of camlhmp in your analysis, please cite the following:

Naming

If I'm being honest, I really wanted to name a tool with "camel" in it because they are my wife's favorite animal๐Ÿช and they also remind me of my friends in Oman!

Once it was decided YAML was going to be the format for defining schemas, I quickly stumbled on "Classification through YAML" and quickly found out I wasn't the only once who thought of "CAML". But, no matter, it was decided it would be something with "CAML", then Tim Read came with the save and suggested "Heuristic Mapping Protocol". So, here we are - camlhmp!

License

I'm not a lawyer and MIT has always been my go-to license. So, MIT it is!

Artificial Intelligence Disclaimer

As of v1.1.3, camlhmp has been developed with minimal assistance of Artificial Intelligence (AI). GitHub Copilot was used for auto-completion, but otherwise all code was written and reviewed by the author.

Funding

Support for this project came (in part) from the Wyoming Public Health Division, and the Center for Applied Pathogen Epidemiology and Outbreak Control (CAPE).

Wyoming Public Health Division Center for Applied Pathogen Epidemiology and Outbreak Control

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

camlhmp-1.1.4.tar.gz (20.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

camlhmp-1.1.4-py3-none-any.whl (30.0 kB view details)

Uploaded Python 3

File details

Details for the file camlhmp-1.1.4.tar.gz.

File metadata

  • Download URL: camlhmp-1.1.4.tar.gz
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for camlhmp-1.1.4.tar.gz
Algorithm Hash digest
SHA256 b7be9b3cedc86024d73b9eb1cf4369a6deb5145f9079c6268722b761e656e3c4
MD5 61324965e9d14c2c9c408661596264ba
BLAKE2b-256 5b5cfb7253d3c5ed3baae9f9d2e79c0d35ed1d8c6dfc8336da80085fa64a92a6

See more details on using hashes here.

Provenance

The following attestation bundles were made for camlhmp-1.1.4.tar.gz:

Publisher: releases.yml on rpetit3/camlhmp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file camlhmp-1.1.4-py3-none-any.whl.

File metadata

  • Download URL: camlhmp-1.1.4-py3-none-any.whl
  • Upload date:
  • Size: 30.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for camlhmp-1.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3b0d5b388dfb11cd447967263fe101115c931d64b2c6c1b46181a266e9972d39
MD5 8a030aef1bd9ad55df5f148fc44b22a2
BLAKE2b-256 772429f0b95e8a2521e448766e68ed5a8e13649787e6b1dd8d25da059d2287f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for camlhmp-1.1.4-py3-none-any.whl:

Publisher: releases.yml on rpetit3/camlhmp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page