Skip to main content

Machine learning-based classification of microbial symbiotic lifestyles

Project description

Symclatron

PyPI version License: BSD-3-Clause Python 3.9+

Machine learning-based classification of microbial symbiotic lifestyles

symclatron is a tool that classifies microbial genomes into three symbiotic lifestyle categories:

  • Free-living
  • Symbiont; Host-associated
  • Symbiont; Obligate-intracellular

Installation

From PyPI (recommended)

pip install symclatron

From source

git clone https://github.com/NeLLi-team/symclatron.git
cd symclatron/
pip install .

Development installation

git clone https://github.com/NeLLi-team/symclatron.git
cd symclatron/
pip install -e ".[dev]"

System Requirements

symclatron requires HMMER to be installed and available in your PATH:

Ubuntu/Debian:

sudo apt-get install hmmer

macOS (with Homebrew):

brew install hmmer

Conda/Mamba:

conda install -c bioconda hmmer

Quick Start

  1. Setup data (required) - Download required database files (only needed once):
symclatron setup
  1. Test with sample genomes:
symclatron test
  1. Classify your genomes:
symclatron classify --genome-dir /path/to/genomes/ --output-dir results/

Usage

Getting help

# Main help
symclatron --help

# Command-specific help
symclatron classify --help
symclatron setup --help

# Show version and information
symclatron --version
symclatron info

Classification command

The main classification command with all options:

symclatron classify [OPTIONS]

Options:

  • --genome-dir, -i: Directory containing genome FASTA files (.faa) [default: input_genomes]
  • --output-dir, -o: Output directory for results [default: output_symclatron]
  • --keep-tmp: Keep temporary files for debugging
  • --threads, -t: Number of threads for HMMER searches [default: 2]
  • --quiet, -q: Suppress progress messages
  • --verbose: Show detailed progress information

Examples:

# Basic usage
symclatron classify --genome-dir genomes/ --output-dir results/

# With more threads and keeping temporary files
symclatron classify -i genomes/ -o results/ --threads 8 --keep-tmp

# Quiet mode
symclatron classify --genome-dir genomes/ --quiet

# Verbose mode with detailed progress
symclatron classify --genome-dir genomes/ --verbose

Results

The classification results are saved in the specified output directory:

Main output files

  1. symclatron_results.tsv - Main classification results with columns:

    • taxon_oid - Genome identifier
    • completeness_UNI56 - Completeness metric based on universal marker genes
    • confidence - Overall confidence score for the classification
    • classification - Final classification label:
      • Free-living
      • Symbiont;Host-associated
      • Symbiont;Obligate-intracellular
  2. classification_summary.txt - Summary report with statistics

  3. Log files - Detailed execution logs with timestamps

Debug files

When using --keep-tmp, intermediate files are preserved in tmp/ directory for analysis.

Performance

symclatron is designed for efficiency:

  • ~2 minutes per genome on consumer-level laptops
  • Most recent benchmark: 306 genomes in ~162 minutes (1.9 min/genome)
  • Memory efficient - suitable for standard workstations

Input requirements

  • File format: Protein FASTA files (.faa, .fasta, .fa)
  • Content: Predicted protein sequences from genomes
  • Quality: Complete or near-complete genomes recommended, but good performance for MQ MAGs are expected

Container usage

Docker

Pull the latest container:

docker pull docker.io/jvillada/symclatron:latest

Test with sample genomes:

my_test_dir=$PWD/test_output_symclatron
mkdir -p $my_test_dir
docker run --rm \
    -v $my_test_dir:/usr/src/symclatron/output \
    docker.io/jvillada/symclatron:latest \
    symclatron test --output-dir output

Classify your genomes:

my_genomes_dir="/path/to/genome/faa_files/"
my_output_dir="/path/to/output/directory/"
mkdir -p $my_output_dir
docker run --rm \
    -v $my_genomes_dir:/usr/src/symclatron/input_genomes \
    -v $my_output_dir:/usr/src/symclatron/output \
    docker.io/jvillada/symclatron:latest \
    symclatron classify --genome-dir input_genomes/ --output-dir output

Apptainer/Singularity

Pull the latest container:

apptainer pull docker://docker.io/jvillada/symclatron:latest

Test with sample genomes:

my_test_dir=$PWD/test_output_symclatron
mkdir -p $my_test_dir
apptainer run \
    --pwd /usr/src/symclatron \
    --bind $my_test_dir:/usr/src/symclatron/output \
    docker://docker.io/jvillada/symclatron:latest \
    symclatron test --output-dir output

Classify your genomes:

my_genomes_dir="/path/to/genome/faa_files/"
my_output_dir="/path/to/output/directory/"
mkdir -p $my_output_dir
apptainer run \
    --pwd /usr/src/symclatron \
    --bind $my_genomes_dir:/usr/src/symclatron/input_genomes \
    --bind $my_output_dir:/usr/src/symclatron/output \
    docker://docker.io/jvillada/symclatron:latest \
    symclatron classify --genome-dir input_genomes/ --output-dir output

Citation

If you use symclatron in your research, please cite:

A genomic catalog of Earth's bacterial and archaeal symbionts. Juan C. Villada, Yumary M. Vasquez, Gitta Szabo, Ewan Whittaker-Walker, Miguel F. Romero, Sarina Qin, Neha Varghese, Emiley A. Eloe-Fadrosh, Nikos C. Kyrpides, SymGs data consortium, Axel Visel, Tanja Woyke, Frederik Schulz bioRxiv 2025.05.29.656868; doi: https://doi.org/10.1101/2025.05.29.656868

Support

License

This project is licensed under the BSD-3-Clause License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

symclatron-0.3.2.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

symclatron-0.3.2-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file symclatron-0.3.2.tar.gz.

File metadata

  • Download URL: symclatron-0.3.2.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for symclatron-0.3.2.tar.gz
Algorithm Hash digest
SHA256 435a2e541325790cc44f9740d2a65ffbe08b3b9fb9fb33b229ee4e8a9ffb4ce2
MD5 46a3c4fb5499e5083dd0ffe997cdf0a7
BLAKE2b-256 3de46a553fa50748e5100b90a88190cd1e12e0e8eec368ae886b74260350e40f

See more details on using hashes here.

File details

Details for the file symclatron-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: symclatron-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for symclatron-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b96c4f0bc95cc205e9e77ea9068af9b48e9c5408790ba048339fc7ac7043feab
MD5 49786c37764c8c2d471417f11cfa8f34
BLAKE2b-256 3822628fbb9870b1a30327afd91bf1dabde2becfbcc470bfb2c46e7708185d92

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page