Skip to main content

symclatron: symbiont classifier

Project description

symclatron: symbiont classifier

ML-based classification of microbial symbiotic lifestyles

symclatron is a tool that classifies microbial genomes (input is protein FASTA files (.faa)) into three symbiotic lifestyle categories:

  • Free-living
  • Symbiont; Host-associated
  • Symbiont; Obligate-intracellular

Installation and quick start

# install pixi
curl -fsSL https://pixi.sh/install.sh | sh
pixi global install python==3.13.5
python -m venv symclatron_env
source symclatron_env/bin/activate
pip install symclatron
symclatron setup
symclatron test

Setup data (required)

Before using symclatron, you need to download the required database files. This only needs to be done once.

symclatron setup

Input file requirements

  • Input file format: Protein FASTA files (.faa)
  • Quality: Complete or near-complete genomes recommended, but good performance for MQ MAGs are expected

Classify your genomes

symclatron classify --genome-dir /path/to/genomes/ --output-dir results/

Getting help

symclatron --help

# Command-specific help
symclatron classify --help
symclatron setup --help

# Show version and information
symclatron --version

Classification command

The main classification command with all options:

symclatron classify [OPTIONS]

Options:

  • --genome-dir, -i: Directory containing genome FASTA files (.faa) [default: input_genomes]
  • --output-dir, -o: Output directory for results [default: output_symclatron]
  • --keep-tmp: Keep temporary files for debugging
  • --threads, -t: Number of threads for HMMER searches [default: 2]
  • --quiet, -q: Suppress progress messages
  • --verbose: Show detailed progress information

Examples:

# Basic usage
symclatron classify --genome-dir genomes/ --output-dir results/

# With more threads and keeping temporary files
symclatron classify -i genomes/ -o results/ --threads 8 --keep-tmp

# Quiet mode
symclatron classify --genome-dir genomes/ --quiet

# Verbose mode with detailed progress
symclatron classify --genome-dir genomes/ --verbose

Results

The classification results are saved in the specified output directory:

Main output files

  1. symclatron_results.tsv - Main classification results with columns:

    • taxon_oid - Genome identifier
    • completeness_UNI56 - Completeness metric based on universal marker genes
    • confidence - Overall confidence score for the classification
    • classification - Final classification label:
      • Free-living
      • Symbiont;Host-associated
      • Symbiont;Obligate-intracellular
  2. classification_summary.txt - Summary report with statistics

  3. Log files - Detailed execution logs with timestamps

Debug files

When using --keep-tmp, intermediate files are preserved in tmp/ directory for analysis.

Performance

symclatron is designed for efficiency:

  • >2 minutes per genome on consumer-level laptops
  • Most recent benchmark: 306 genomes in ~162 minutes (1.9 min/genome)
  • Memory efficient - suitable for standard workstations

Container usage

Apptainer/Singularity

Pull the latest container:

apptainer pull docker://docker.io/jvillada/symclatron:latest

Test with sample genomes:

my_test_dir=$PWD/test_output_symclatron
mkdir -p $my_test_dir
apptainer run \
    --pwd /usr/src/symclatron \
    --bind $my_test_dir:/usr/src/symclatron/output \
    docker://docker.io/jvillada/symclatron:latest \
    pixi run test --output-dir output

Classify your genomes:

my_genomes_dir="/path/to/genome/faa_files/"
my_output_dir="/path/to/output/directory/"
mkdir -p $my_output_dir
apptainer run \
    --pwd /usr/src/symclatron \
    --bind $my_genomes_dir:/usr/src/symclatron/input_genomes \
    --bind $my_output_dir:/usr/src/symclatron/output \
    docker://docker.io/jvillada/symclatron:latest \
    pixi run -- ./symclatron classify --genome-dir input_genomes/ --output-dir output

Citation

If you use symclatron in your research, please cite:

A genomic catalog of Earth’s bacterial and archaeal symbionts. Juan C. Villada, Yumary M. Vasquez, Gitta Szabo, Ewan Whittaker-Walker, Miguel F. Romero, Sarina Qin, Neha Varghese, Emiley A. Eloe-Fadrosh, Nikos C. Kyrpides, SymGs data consortium, Axel Visel, Tanja Woyke, Frederik Schulz bioRxiv 2025.05.29.656868; doi: https://doi.org/10.1101/2025.05.29.656868

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

symclatron-0.5.1.tar.gz (23.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

symclatron-0.5.1-py3-none-any.whl (23.4 kB view details)

Uploaded Python 3

File details

Details for the file symclatron-0.5.1.tar.gz.

File metadata

  • Download URL: symclatron-0.5.1.tar.gz
  • Upload date:
  • Size: 23.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for symclatron-0.5.1.tar.gz
Algorithm Hash digest
SHA256 2bb1ec456209148edb80a7704ae0d73fd633c7ace21f6fe53894d8783e94952b
MD5 7be9f544d918695416eeb81e73757fb0
BLAKE2b-256 b1efc83f90eaa06ad4a67fd44aef161b62c172325a8dac2c2c887837f934badf

See more details on using hashes here.

File details

Details for the file symclatron-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: symclatron-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 23.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for symclatron-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6881d0bec5b69d17781fafc0ecdd88d05d3a321f534d6546dc12c43939c224a6
MD5 0f6ca718dcbd8c31f6dfbcedc4df144e
BLAKE2b-256 58e23a00a516d1ec3ffa3486939a003a1ea16dc84e4a9b47cd2d0246f9000d05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page