Skip to main content

symclatron: symbiont classifier

Project description

symclatron: symbiont classifier

ML-based classification of microbial symbiotic lifestyles

symclatron is a tool that classifies microbial genomes into three symbiotic lifestyle categories:

  • Free-living
  • Symbiont; Host-associated
  • Symbiont; Obligate-intracellular

Installation

Clone the symclatron repository:

git clone https://github.com/NeLLi-team/symclatron.git
cd symclatron/
chmod u+x symclatron

Using pixi (recommended)

Pixi is a fast, multi-platform package manager that provides the best experience with symclatron.

  1. Install pixi by following the instructions at https://pixi.sh/

  2. Install dependencies and setup environment:

pixi install

Quick start with pixi:

# Before using symclatron, you need to download the required database files. This only needs to be done once.
pixi run setup

# Get help
pixi run help

# Run test with sample genomes
pixi run test

# Show detailed information
pixi run info

Using conda/mamba

If you prefer conda/mamba, create the environment:

mamba create -c conda-forge -c bioconda --name symclatron --file requirements.txt
mamba activate symclatron

Setup data (required)

Before using symclatron, you need to download the required database files. This only needs to be done once.

Using conda/mamba for setup

mamba activate symclatron
./symclatron setup

Manual setup

If the automated setup fails, you can extract the data manually:

tar -xzf data.tar.gz

Classify your genomes

# Using pixi
pixi run -- ./symclatron classify --genome-dir /path/to/genomes/ --output-dir results/

# Using conda/mamba
mamba activate symclatron
./symclatron classify --genome-dir /path/to/genomes/ --output-dir results/

Usage guide

Getting help

# Main help
./symclatron --help

# Command-specific help
./symclatron classify --help
./symclatron setup --help

# Show version and information
./symclatron --version
./symclatron info

Classification command

The main classification command with all options:

./symclatron classify [OPTIONS]

Options:

  • --genome-dir, -i: Directory containing genome FASTA files (.faa) [default: input_genomes]
  • --output-dir, -o: Output directory for results [default: output_symclatron]
  • --keep-tmp: Keep temporary files for debugging
  • --threads, -t: Number of threads for HMMER searches [default: 2]
  • --quiet, -q: Suppress progress messages
  • --verbose: Show detailed progress information

Examples:

# Basic usage
./symclatron classify --genome-dir genomes/ --output-dir results/

# With more threads and keeping temporary files
./symclatron classify -i genomes/ -o results/ --threads 8 --keep-tmp

# Quiet mode
./symclatron classify --genome-dir genomes/ --quiet

# Verbose mode with detailed progress
./symclatron classify --genome-dir genomes/ --verbose

Pixi tasks

Pixi provides pre-configured tasks for common operations:

# Setup and basic operations
pixi run setup              # Download and setup data
pixi run help               # Show help
pixi run info               # Show detailed information

# Testing and quick runs
pixi run test               # Run with test genomes
pixi run test-keep-tmp      # Run test keeping temporary files

# Custom classification examples
pixi run classify-custom    # Example with custom settings
pixi run classify-verbose   # Example with verbose output

# Direct command access
pixi run -- ./symclatron classify --genome-dir my_genomes/ --output-dir results/

Results

The classification results are saved in the specified output directory:

Main output files

  1. symclatron_results.tsv - Main classification results with columns:

    • taxon_oid - Genome identifier
    • completeness_UNI56 - Completeness metric based on universal marker genes
    • confidence - Overall confidence score for the classification
    • classification - Final classification label:
      • Free-living
      • Symbiont;Host-associated
      • Symbiont;Obligate-intracellular
  2. classification_summary.txt - Summary report with statistics

  3. Log files - Detailed execution logs with timestamps

Debug files

When using --keep-tmp, intermediate files are preserved in tmp/ directory for analysis.

Performance

symclatron is designed for efficiency:

  • ~2 minutes per genome on consumer-level laptops
  • Most recent benchmark: 306 genomes in ~162 minutes (1.9 min/genome)
  • Memory efficient - suitable for standard workstations

Container usage

Apptainer/Singularity

Pull the latest container:

apptainer pull docker://docker.io/jvillada/symclatron:latest

Test with sample genomes:

my_test_dir=$PWD/test_output_symclatron
mkdir -p $my_test_dir
apptainer run \
    --pwd /usr/src/symclatron \
    --bind $my_test_dir:/usr/src/symclatron/output \
    docker://docker.io/jvillada/symclatron:latest \
    pixi run test --output-dir output

Classify your genomes:

my_genomes_dir="/path/to/genome/faa_files/"
my_output_dir="/path/to/output/directory/"
mkdir -p $my_output_dir
apptainer run \
    --pwd /usr/src/symclatron \
    --bind $my_genomes_dir:/usr/src/symclatron/input_genomes \
    --bind $my_output_dir:/usr/src/symclatron/output \
    docker://docker.io/jvillada/symclatron:latest \
    pixi run -- ./symclatron classify --genome-dir input_genomes/ --output-dir output

Advanced options

Input requirements

  • File format: Protein FASTA files (.faa, .fasta, .fa)
  • Content: Predicted protein sequences from genomes
  • Quality: Complete or near-complete genomes recommended, but good performance for MQ MAGs are expected

Citation

If you use symclatron in your research, please cite:

A genomic catalog of Earth’s bacterial and archaeal symbionts. Juan C. Villada, Yumary M. Vasquez, Gitta Szabo, Ewan Whittaker-Walker, Miguel F. Romero, Sarina Qin, Neha Varghese, Emiley A. Eloe-Fadrosh, Nikos C. Kyrpides, SymGs data consortium, Axel Visel, Tanja Woyke, Frederik Schulz bioRxiv 2025.05.29.656868; doi: https://doi.org/10.1101/2025.05.29.656868

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

symclatron-0.5.0.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

symclatron-0.5.0-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file symclatron-0.5.0.tar.gz.

File metadata

  • Download URL: symclatron-0.5.0.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for symclatron-0.5.0.tar.gz
Algorithm Hash digest
SHA256 08b4116c16b7fec617bde521ebf0e9aa4dd6052857bf52ea2ff82cfc5156db63
MD5 cf96b3635bb9ad2ae10de50feb15f692
BLAKE2b-256 baf00c95554f7462409b8adc54280332faa138d4aeffdc9235b755d0a3f1a4b4

See more details on using hashes here.

File details

Details for the file symclatron-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: symclatron-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 23.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for symclatron-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c1bbc7fbfd02eb17adb44b0cbf5d76c240b4f0cfebae809a369457aebe38b805
MD5 cccb8f3dc1335be0c1231628360f9970
BLAKE2b-256 a0317eadf320a06971f6551e8862c9baa42176edc00b0e583d38b457aec5dec8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page