Genome-level organic-acid and SCFA pathway profiling for lactobacilli and related bacteria.
Project description
LactoSCFA
LactoSCFA is a lightweight command-line tool for genome-level prediction of organic-acid and short-chain fatty acid (SCFA) pathway potential from bacterial genomes or protein FASTA files.
The tool maps protein evidence to curated gene families, scores acid-producing pathway modules, and reports interpretable genome-level calls such as complete, near_complete, partial, and absent. It is designed for fast screening, comparative genomics, and thesis-level analysis where transparent pathway evidence is more useful than a black-box phenotype label.
LactoSCFA reports genomic potential. It does not directly measure metabolite concentration, growth-condition-dependent flux, or in vivo acid output.
Validation
LactoSCFA was tested against an external phenotype-plus-genome dataset from a human gut Bacteroidales culture collection:
- Article: Zhang et al., 2024, Cell Host & Microbe, "Comprehensive analyses of a large human gut Bacteroidales culture collection reveal species- and strain-level diversity and evolution"
- DOI: 10.1016/j.chom.2024.08.016
- Public data/code repository: DFI-Bioinformatics/DFI_Bacteroidales
- Phenotype source used here:
data/metab.quant.matrix.csv, described by the data repository as SCFA production or consumption in mM for 111 isolates measured by quantitative metabolomics. - Dataset reconstructed for LactoSCFA validation: 111 genomes and 444 measured acid records covering acetate, propionate, butyrate, and succinate.
- Main validation results: butyrate strict prediction reached 6 TP and 105 TN with balanced accuracy 1.00; propionate potential prediction reached 103 TP, 1 FP, 7 TN, and 0 FN with accuracy 0.991.
- Interpretation: the benchmark supports LactoSCFA for strict butyrate prediction and propionate-potential screening. Succinate underperformance is treated as a database/module improvement target rather than a negative biological conclusion.
The article page mainly exposes this phenotype information visually as a heatmap. For the LactoSCFA benchmark, the machine-readable validation table was reconstructed from the public repository matrix rather than copied from a supplementary phenotype table. The construction steps were:
- Read
metab.quant.matrix.csv, whose rows are isolate IDs and whose acid columns areAcetate,Propionate,Butyrate, andSuccinate. - Convert the wide matrix to a long table:
111 isolates x 4 acids = 444 records. - Classify measured phenotypes as
producerwhen delta mM> 0.1,consumerwhen delta mM< -0.1, andneutralotherwise. - Match isolate IDs to public genome assemblies from BioProjects
PRJNA737800andPRJNA792599, download protein FASTA files, run LactoSCFAdb_v2profile mode, and compare acid-level calls with the reconstructed phenotype classes.
Installation
Python 3.11 or later is required.
Recommended Linux server installation with conda, DIAMOND, and Prodigal:
bash install_lactoscfa_linux.sh
./lactoscfa_lab.sh check-env
The installer creates or updates a conda environment named lactoscfa, installs diamond and prodigal from Bioconda, installs LactoSCFA, and writes a wrapper script that activates the environment before running the command.
Install from PyPI:
python -m pip install lactoscfa
lactoscfa --help
For offline Linux servers, install the wheel:
python3 -m pip install --user lactoscfa-0.4.1-py3-none-any.whl --no-deps
python3 -m lactoscfa.cli validate-db
If user-site installation is blocked:
python3 -m pip install --target ./lactoscfa_py --no-index --no-deps lactoscfa-0.4.1-py3-none-any.whl
PYTHONPATH="$PWD/lactoscfa_py" python3 -m lactoscfa.cli validate-db
For source installation:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install .
lactoscfa validate-db
If source or .tar.gz installation fails with BackendUnavailable: Cannot import 'setuptools.build_meta', install the wheel instead or install setuptools in that Python environment.
Use
DIAMOND is required for protein FASTA input. DIAMOND and Prodigal are required for genome FASTA input.
lactoscfa check-env
lactoscfa validate-db
Run protein FASTA profiling:
lactoscfa profile \
--proteins path/to/protein_faa_dir \
--mode strict \
--threads 8 \
--out results/lactoscfa_profile
Run genome FASTA profiling:
lactoscfa profile \
--genomes path/to/genome_fasta_dir \
--mode strict \
--threads 8 \
--out results/lactoscfa_genomes
Explain one genome:
lactoscfa explain \
--strain STRAIN_ID \
--result results/lactoscfa_profile \
--acid-set core,crossfeed \
--text-summary \
--out results/STRAIN_ID_explain
The command-line help provides the full option list:
lactoscfa --help
lactoscfa profile --help
lactoscfa explain --help
Example pathway_summary.txt
lactoscfa explain --text-summary writes a compact, readable pathway-evidence summary for one genome. Example excerpt:
Genome-level SCFA potential
ATCC_334 | db_v2 | profile mode
acid status
acetate complete
propionate absent
butyrate absent
lactate complete
succinate absent
formate complete
Butyrate pathway evidence
P1 acetyl-CoA route absent
metabolites: Acetyl-CoA -> acetoacetyl-CoA -> 3-hydroxybutyryl-CoA -> crotonyl-CoA -> butyryl-CoA -> butyrate
reaction genes:
1. Acetyl-CoA --[thl detected]--> acetoacetyl-CoA
2. acetoacetyl-CoA --[hbd missing]--> 3-hydroxybutyryl-CoA
3. 3-hydroxybutyryl-CoA --[crt missing]--> crotonyl-CoA
4. crotonyl-CoA --[bcd/etfAB missing]--> butyryl-CoA
5. butyryl-CoA --[but/ptb-buk/ctfAB detected]--> butyrate
Output Files
Default profile and score runs write a compact result set:
db_manifest.json
report.md
run_manifest.json
tables/acid_details.tsv
tables/acid_summary.tsv
tables/gene_hits.filtered.tsv
tables/pathway_details.tsv
tables/pathway_summary.tsv
tables/strain_summary.tsv
Use --full-output only when matrix tables and SVG summary figures are needed.
Repository Notes
lactoscfa/: Python package and command-line implementation.lactoscfa/data/db_v2/: packaged curated database used by default.db_v2/: source-tree copy of the curated database.examples/: minimal example files.tests/: regression tests.scripts/: development and analysis utilities.
For publication or thesis use, report the LactoSCFA version, database version, search mode, thresholds, and validation scope.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lactoscfa-0.4.1.tar.gz.
File metadata
- Download URL: lactoscfa-0.4.1.tar.gz
- Upload date:
- Size: 259.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe3b7fef9368228f9829dfbc62d2a97d9fe10621cefed0a82008a9aecb8430e1
|
|
| MD5 |
8883c1fd9ad95fc6b7f88fa94c532df7
|
|
| BLAKE2b-256 |
afc6be65ab828165d5f734fe6afd5c6099c53613ee6c613d1c2b05d0aadcd1ef
|
File details
Details for the file lactoscfa-0.4.1-py3-none-any.whl.
File metadata
- Download URL: lactoscfa-0.4.1-py3-none-any.whl
- Upload date:
- Size: 247.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7e78fe3420184b0dcea2199d8ab4767155bed6cec0be3a35211e26473ed0150
|
|
| MD5 |
706fd5b4f96ff7e11f7224f9c45311f5
|
|
| BLAKE2b-256 |
7ae76786bc7d71ae7f5d85784beb3454e9f8a83e1b0799fc81c061a50f23f822
|