Skip to main content

symclatron: symbiont classifier

Project description

symclatron: symbiont classifier

Figure 1

ML-based classification of microbial symbiotic lifestyles

symclatron is a tool that classifies microbial genomes (protein FASTA, or nucleotide FASTA with automatic protein prediction) into three lifestyle categories:

  • Free-living
  • Symbiont; Host-associated
  • Symbiont; Obligate-intracellular

Installation and quick start

Recommended install paths are Pixi (recommended) or Mamba/Conda.

Option 1: Pixi (recommended)

Install pixi:

curl -fsSL https://pixi.sh/install.sh | sh

More information about pixi can be found in the pixi documentation.

Install, setup, and test:

pixi global install -c conda-forge -c bioconda -c https://repo.prefix.dev/astrogenomics symclatron
symclatron setup --force
symclatron test
# Outputs are written under `output_test_Symclatron_<DATETIME>/faa` and `output_test_Symclatron_<DATETIME>/fna`
# (or under `--output-dir` if provided).

Option 2: Mamba or Conda

mamba create -n symclatron-0.9.10 -c conda-forge -c bioconda -c https://repo.prefix.dev/astrogenomics symclatron
mamba run -n symclatron-0.9.10 symclatron setup
mamba run -n symclatron-0.9.10 symclatron test
# Outputs are written under `output_test_Symclatron_<DATETIME>/faa` and `output_test_Symclatron_<DATETIME>/fna`
# (or under `--output-dir` if provided).

Note: symcla is a short alias for symclatron. Any command works with either name (for example, symcla test).

Setup data (required)

Before using symclatron for the first time, you need to download the required database files. This only needs to be done once.

symclatron setup

Input file requirements

  • Input: a directory with one genome per file
  • Supported FASTA types:
    • Proteins (recommended): .faa (also accepts common protein FASTA suffixes, optionally gzipped)
    • Nucleotide contigs/assemblies: .fa, .fna, .fasta (proteins predicted with pyrodigal)
    • Nucleotide genes/CDS: .ffn, .fnn (translated in-frame)
  • Quality: Complete or near-complete genomes recommended, but good performance for MQ MAGs are expected

symclatron auto-detects whether each input file contains proteins, genes, or contigs and converts nucleotide inputs to proteins before running the standard workflow. If your nucleotide file extensions are ambiguous, you can override detection with --input-kind contigs or --input-kind genes.

Classify your genomes

# Protein FASTA input
symclatron classify --genome-dir /path/to/genomes/ --output-dir results/

# Nucleotide contigs/assemblies input (auto protein prediction)
symclatron classify --genome-dir /path/to/contigs/ --output-dir results/

# Ambiguous nucleotide files: force contig mode and only use .fna files
symclatron classify --genome-dir /path/to/inputs/ --input-kind contigs --input-ext .fna --output-dir results/

Getting help

symclatron --help

# Command-specific help
symclatron classify --help
symclatron setup --help

# Show version and information
symclatron --version

Classification command

The main classification command with all options:

symclatron classify [OPTIONS]

Options:

  • --genome-dir, -i: Directory (or FASTA file) containing genome inputs (.faa/.fa/.fna/.fasta/.ffn/.fnn) [default: input_genomes]
  • --input-kind: Force input kind: auto, proteins, genes, contigs [default: auto]
  • --input-ext: Only include files with these extensions (repeatable), e.g. --input-ext .fna (also matches .fna.gz)
  • --output-dir, -o: Output directory for results [default: output_Symclatron_]
  • --keep-tmp: Keep temporary files for debugging
  • --threads, -t: Number of threads for HMMER searches [default: 2]
  • --quiet, -q: Suppress progress messages
  • --verbose: Show detailed progress information

Examples:

# Basic usage
symclatron classify --genome-dir genomes/ --output-dir results/

# With more threads and keeping temporary files
symclatron classify -i genomes/ -o results/ --threads 8 --keep-tmp

# Quiet mode
symclatron classify --genome-dir genomes/ --quiet

# Verbose mode with detailed progress
symclatron classify --genome-dir genomes/ --verbose

Results

The classification results are saved in the specified output directory:

Main output files

  1. symclatron_results.tsv - Main classification results with columns:

    • taxon_oid - Genome identifier
    • completeness_UNI56 - Completeness metric based on universal marker genes
    • confidence - Overall confidence score for the classification
    • classification - Final classification label:
      • Free-living
      • Symbiont;Host-associated
      • Symbiont;Obligate-intracellular
  2. classification_summary.txt - Summary report with statistics

  3. Log files - Detailed execution logs with timestamps

Debug files

When using --keep-tmp, intermediate files are preserved in tmp/ directory for analysis.

Performance

symclatron is designed for efficiency:

  • >2 minutes per genome on consumer-level laptops
  • Most recent benchmark: 306 genomes in ~162 minutes (1.9 min/genome)
  • Memory efficient - suitable for standard workstations

Citation

If you use symclatron in your research, please cite:

A genomic catalog of Earth’s bacterial and archaeal symbionts. Juan C. Villada, Yumary M. Vasquez, Gitta Szabo, Ewan Whittaker-Walker, Miguel F. Romero, Sarina Qin, Neha Varghese, Emiley A. Eloe-Fadrosh, Nikos C. Kyrpides, SymGs data consortium, Axel Visel, Tanja Woyke, Frederik Schulz bioRxiv 2025.05.29.656868; doi: https://doi.org/10.1101/2025.05.29.656868

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

symclatron-0.9.10.tar.gz (944.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

symclatron-0.9.10-py3-none-any.whl (29.8 kB view details)

Uploaded Python 3

File details

Details for the file symclatron-0.9.10.tar.gz.

File metadata

  • Download URL: symclatron-0.9.10.tar.gz
  • Upload date:
  • Size: 944.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for symclatron-0.9.10.tar.gz
Algorithm Hash digest
SHA256 8b0c0c6b8ae3104c61e285ae0d88595b33448b877e777aa8cc4a04960fc95ad6
MD5 c5fd3ef21f89ee8281072744f902b3ad
BLAKE2b-256 fcde6f365db6f018fee5799f336f10b8bdfcb7b7bcec206087eb205e2d8733da

See more details on using hashes here.

File details

Details for the file symclatron-0.9.10-py3-none-any.whl.

File metadata

  • Download URL: symclatron-0.9.10-py3-none-any.whl
  • Upload date:
  • Size: 29.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for symclatron-0.9.10-py3-none-any.whl
Algorithm Hash digest
SHA256 96053d1645ffda5f44c7a3cb81b5658dbd24776cd23b5715b7a30a44de08752e
MD5 35b5e2b2057534e92589962cdfb1a2af
BLAKE2b-256 cc4a5e0e03b51c98e17915697cb0bc8b97967a218e58526acec75325a5c6c3a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page