AutoGluon-based Incidence predictor for Salmonella virulence-factor gene-frequency features
Project description
salmopredict
AutoGluon-based Incidence predictor for Salmonella virulence-factor gene-frequency features, with a command-line interface and a Streamlit GUI.
Given a feature table (rows = samples, columns = virulence-factor genes),
salmopredict aligns the columns to the features a pre-trained AutoGluon
TabularPredictor expects, runs the WeightedEnsemble_L2 model, and writes a
single prediction file. It reproduces the alignment used by the original
predict_autogluon.py: column names are normalised R-make.names-style
(/ and - become .), genes the model expects but the input lacks are filled
with 0 (a missing gene means frequency 0), and extra input columns are ignored.
Input/output contract — one input CSV in, one output CSV out. The output
columns depend on whether the input has a Sample column:
| Input | Output columns |
|---|---|
No Sample column (features only) |
Incidence(%) |
Has a Sample column |
Sample, Incidence(%) |
Has a Sample column and --attach meta.csv |
Sample, Incidence(%), + the metadata's other columns |
Metadata is joined on the Sample key (the metadata CSV must also have a
Sample column), so attaching metadata requires a Sample column in the input.
Install
salmopredict runs on Python 3.10 and loads its model with AutoGluon 1.1.1 — both are hard requirements, because the model is pickled with that exact stack.
From PyPI (recommended). In a Python 3.10 environment:
pip install salmopredict
This pulls in AutoGluon 1.1.1, the Streamlit GUI, and the bundled prediction model, so both interfaces work out of the box:
salmopredict run -i features.csv -o results/ # command line
salmopredict gui # browser GUI
No Python 3.10 environment yet? Create one first, e.g.
conda create -n salmopredict python=3.10 && conda activate salmopredict.
Reproducible environment (from a clone). Pins Python 3.10 and installs AutoGluon via pip inside the env (conda-installed AutoGluon does not resolve cleanly for this project):
conda env create -f environment.yml
conda activate salmopredict
Editable / development install (from a clone).
pip install -e . # installs the CLI and the Streamlit GUI
The model
The prediction model is already bundled with salmopredict — both in this
repository and inside the PyPI wheel — at salmopredict/models/model_default, a
30 MB deployment. salmopredict uses it automatically, so the tool works out of
the box with no extra download or build step.
Model resolution order is --model, then $SALMOPREDICT_MODEL, then the single
directory under the package models/ folder; with nothing specified it uses the
bundled model_default. Pass --model /path/to/other to run a different
AutoGluon model.
Usage
Ready-to-run inputs live in examples/ (see its README):
example_features.csv (Type 1, no Sample), example_with_sample.csv
(Type 2, with Sample), and example_meta.csv (metadata to attach). Features
are gene_frequency × log10(CFU dose), matching how the model was trained. Try
one immediately:
salmopredict run -i examples/example_features.csv -o results/
# Features only -> output has just Incidence(%)
salmopredict run -i features.csv -o results/ --model /path/to/model
# With a Sample column -> output has Sample, Incidence(%)
salmopredict run -i examples/example_with_sample.csv -o results/
# Attach metadata joined on the Sample key -> Sample, Incidence(%), + meta columns
salmopredict run -i examples/example_with_sample.csv -o results/ \
--attach examples/example_meta.csv
# Launch the GUI, or check the environment/model
salmopredict gui
salmopredict check --model /path/to/model
Each run writes one pred_<input-stem>.csv to the output directory; the
prediction column is Incidence(%). Features filled with 0 (genes the model
expects but the input lacks) are always reported, and a prominent warning
appears when more than --missing-warn-frac (default 0.3) of the model's
features are missing.
License
Licensed under the PolyForm Noncommercial License 1.0.0: free to use, modify, and share for any noncommercial purpose — including research, teaching, and personal use, and by academic, government, public-health, and other nonprofit organizations — but commercial use is not permitted. Developed at the State Key Laboratory of Veterinary Public Health and Safety, China Agricultural University, in collaboration with the China National Center for Food Safety Risk Assessment (CFSA).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file salmopredict-0.2.0.tar.gz.
File metadata
- Download URL: salmopredict-0.2.0.tar.gz
- Upload date:
- Size: 29.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ce3fa18d2e9b0637f54dd2717b5b5e828ce1ed7d7bd3762d9e4abf9187462fb
|
|
| MD5 |
1a5f453efedfd2a6b4a61cb3e550298e
|
|
| BLAKE2b-256 |
40baa089fa3bad1dcd88ae4334d883c3adb552c591a87a708e1357789dd32882
|
File details
Details for the file salmopredict-0.2.0-py3-none-any.whl.
File metadata
- Download URL: salmopredict-0.2.0-py3-none-any.whl
- Upload date:
- Size: 29.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2cc351a82ce5534012c5e2e3fb9513a6c3cc67a1e24fa9f45b84d78ff4912fa
|
|
| MD5 |
0537146dee6ba4a49665a1d735255097
|
|
| BLAKE2b-256 |
eead0d5d816e4585ed009f71e174d0c784b0fe1a2321e437af8de79fb27c1dc8
|