Machine learning-based classification of microbial symbiotic lifestyles
Project description
Symclatron
Machine learning-based classification of microbial symbiotic lifestyles
symclatron is a tool that classifies microbial genomes into three symbiotic lifestyle categories:
- Free-living
- Symbiont; Host-associated
- Symbiont; Obligate-intracellular
Installation
From PyPI (recommended)
pip install symclatron
From source
git clone https://github.com/NeLLi-team/symclatron.git
cd symclatron/
pip install .
Development installation
git clone https://github.com/NeLLi-team/symclatron.git
cd symclatron/
pip install -e ".[dev]"
System Requirements
symclatron requires HMMER to be installed and available in your PATH:
Ubuntu/Debian:
sudo apt-get install hmmer
macOS (with Homebrew):
brew install hmmer
Conda/Mamba:
conda install -c bioconda hmmer
Quick Start
- Setup data (required) - Download required database files (only needed once):
symclatron setup
- Test with sample genomes:
symclatron test
- Classify your genomes:
symclatron classify --genome-dir /path/to/genomes/ --output-dir results/
Usage
Getting help
# Main help
symclatron --help
# Command-specific help
symclatron classify --help
symclatron setup --help
# Show version and information
symclatron --version
symclatron info
Classification command
The main classification command with all options:
symclatron classify [OPTIONS]
Options:
--genome-dir, -i: Directory containing genome FASTA files (.faa) [default: input_genomes]--output-dir, -o: Output directory for results [default: output_symclatron]--keep-tmp: Keep temporary files for debugging--threads, -t: Number of threads for HMMER searches [default: 2]--quiet, -q: Suppress progress messages--verbose: Show detailed progress information
Examples:
# Basic usage
symclatron classify --genome-dir genomes/ --output-dir results/
# With more threads and keeping temporary files
symclatron classify -i genomes/ -o results/ --threads 8 --keep-tmp
# Quiet mode
symclatron classify --genome-dir genomes/ --quiet
# Verbose mode with detailed progress
symclatron classify --genome-dir genomes/ --verbose
Results
The classification results are saved in the specified output directory:
Main output files
-
symclatron_results.tsv- Main classification results with columns:taxon_oid- Genome identifiercompleteness_UNI56- Completeness metric based on universal marker genesconfidence- Overall confidence score for the classificationclassification- Final classification label:Free-livingSymbiont;Host-associatedSymbiont;Obligate-intracellular
-
classification_summary.txt- Summary report with statistics -
Log files - Detailed execution logs with timestamps
Debug files
When using --keep-tmp, intermediate files are preserved in tmp/ directory for analysis.
Performance
symclatron is designed for efficiency:
- ~2 minutes per genome on consumer-level laptops
- Most recent benchmark: 306 genomes in ~162 minutes (1.9 min/genome)
- Memory efficient - suitable for standard workstations
Input requirements
- File format: Protein FASTA files (.faa, .fasta, .fa)
- Content: Predicted protein sequences from genomes
- Quality: Complete or near-complete genomes recommended, but good performance for MQ MAGs are expected
Container usage
Docker
Pull the latest container:
docker pull docker.io/jvillada/symclatron:latest
Test with sample genomes:
my_test_dir=$PWD/test_output_symclatron
mkdir -p $my_test_dir
docker run --rm \
-v $my_test_dir:/usr/src/symclatron/output \
docker.io/jvillada/symclatron:latest \
symclatron test --output-dir output
Classify your genomes:
my_genomes_dir="/path/to/genome/faa_files/"
my_output_dir="/path/to/output/directory/"
mkdir -p $my_output_dir
docker run --rm \
-v $my_genomes_dir:/usr/src/symclatron/input_genomes \
-v $my_output_dir:/usr/src/symclatron/output \
docker.io/jvillada/symclatron:latest \
symclatron classify --genome-dir input_genomes/ --output-dir output
Apptainer/Singularity
Pull the latest container:
apptainer pull docker://docker.io/jvillada/symclatron:latest
Test with sample genomes:
my_test_dir=$PWD/test_output_symclatron
mkdir -p $my_test_dir
apptainer run \
--pwd /usr/src/symclatron \
--bind $my_test_dir:/usr/src/symclatron/output \
docker://docker.io/jvillada/symclatron:latest \
symclatron test --output-dir output
Classify your genomes:
my_genomes_dir="/path/to/genome/faa_files/"
my_output_dir="/path/to/output/directory/"
mkdir -p $my_output_dir
apptainer run \
--pwd /usr/src/symclatron \
--bind $my_genomes_dir:/usr/src/symclatron/input_genomes \
--bind $my_output_dir:/usr/src/symclatron/output \
docker://docker.io/jvillada/symclatron:latest \
symclatron classify --genome-dir input_genomes/ --output-dir output
Citation
If you use symclatron in your research, please cite:
A genomic catalog of Earth's bacterial and archaeal symbionts. Juan C. Villada, Yumary M. Vasquez, Gitta Szabo, Ewan Whittaker-Walker, Miguel F. Romero, Sarina Qin, Neha Varghese, Emiley A. Eloe-Fadrosh, Nikos C. Kyrpides, SymGs data consortium, Axel Visel, Tanja Woyke, Frederik Schulz bioRxiv 2025.05.29.656868; doi: https://doi.org/10.1101/2025.05.29.656868
Support
- Repository: https://github.com/NeLLi-team/symclatron
- Issues: https://github.com/NeLLi-team/symclatron/issues
- Author: Juan C. Villada jvillada@lbl.gov
License
This project is licensed under the BSD-3-Clause License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file symclatron-0.3.2.tar.gz.
File metadata
- Download URL: symclatron-0.3.2.tar.gz
- Upload date:
- Size: 24.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
435a2e541325790cc44f9740d2a65ffbe08b3b9fb9fb33b229ee4e8a9ffb4ce2
|
|
| MD5 |
46a3c4fb5499e5083dd0ffe997cdf0a7
|
|
| BLAKE2b-256 |
3de46a553fa50748e5100b90a88190cd1e12e0e8eec368ae886b74260350e40f
|
File details
Details for the file symclatron-0.3.2-py3-none-any.whl.
File metadata
- Download URL: symclatron-0.3.2-py3-none-any.whl
- Upload date:
- Size: 20.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b96c4f0bc95cc205e9e77ea9068af9b48e9c5408790ba048339fc7ac7043feab
|
|
| MD5 |
49786c37764c8c2d471417f11cfa8f34
|
|
| BLAKE2b-256 |
3822628fbb9870b1a30327afd91bf1dabde2becfbcc470bfb2c46e7708185d92
|