A collection of microbial datasets obtained from metabolic modeling for machine learning research
Project description
Welcome to the Friend or Foe repository!
FriendOrFoe is a collection of environmental datasets obtained from metabolic modeling of microbial communities AGORA and CARVEME. FriendOrFoe gathers 64 tabular datasets (16 for AGORA with 100 additional compounds, 16 for AGORA with 50 additional compounds, 16 for CARVEME with 100 additional compounds, 16 for CARVEME with 50 additional compounds), which were constructed by studying more than 10 000 pairs of microbes via Flux Balance Analysis. Our collection could be investigated by four machine learning frameworks. The code underlying the metabolic modeling process is available here. Running Matlab code requires Gurobi Academic License.
Repository structure
- examples: provides notebooks with examples on various tasks
- exp: stores
.jsonfiles with final metrics - models: contains codes, environments and
.jsonfiles for the experiments
Getting started
Download the data from our HugginFace repo: https://huggingface.co/datasets/powidla/Friend-Or-Foe
from huggingface_hub import hf_hub_download
import pandas as pd
REPO_ID = "powidla/Friend-Or-Foe"
# File paths within the repo
X_train_ID = "Classification/AGORA/100/BC-I/X_train_BC-I-100.csv"
X_val_ID = "Classification/AGORA/100/BC-I/X_val_BC-I-100.csv"
X_test_ID = "Classification/AGORA/100/BC-I/X_test_BC-I-100.csv"
y_train_ID = "Classification/AGORA/100/BC-I/y_train_BC-I-100.csv"
y_val_ID = "Classification/AGORA/100/BC-I/y_val_BC-I-100.csv"
y_test_ID = "Classification/AGORA/100/BC-I/y_test_BC-I-100.csv"
# Download and load CSVs as pandas DataFrames
X_train = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=X_train_ID, repo_type="dataset"))
X_val = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=X_val_ID, repo_type="dataset"))
X_test = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=X_test_ID, repo_type="dataset"))
y_train = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=y_train_ID, repo_type="dataset"))
y_val = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=y_val_ID, repo_type="dataset"))
y_test = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=y_test_ID, repo_type="dataset"))
Baseline Demo Notebooks
Quickstart notebook
We provide an end-to-end example on how to predict competitive and cooperative interactions with TabNet.
Examples
The notebooks contain a simple example of using baseline models for predicting microbial interactions.
Reproducing the results
To execute the lines below for Supervised models data path should be organized as follows
FOFdata/<Task>/<Collection>/<Group>/<Dataset>/csv/<name>.csv
For example,
FOFdata/Regression/CARVEME/50/GR-III/csv/X_train_GR-III.csv
Scripts below assume that after creating FOFdata folder the above structure holds.
Supervised models
TabM
To train and test TabM we followed an example. We donwloaded the data into FOFdata folder.
mamba env create -f tabm.yaml
mkdir FOFdata
python main.py
FT-Transformer
To train and test FT-Transformer we followed an example.
mamba env create -f ft.yaml
mkdir FOFdata
python main.py
TabNet
To train and test TabNet we followed instructions from the package.
mamba env create -f tabnet.yaml
mkdir FOFdata
python main.py
GBDTs
We evaluate XGBoost, LightGBM and Catboost as our baselines here.
mamba env create -f gbdts.yaml
mkdir FOFdata
python main.py
Unsupervised models
mamba env create -f uns.yaml
mkdir FOFdata
python main.py
Generative models
TVAE, CTGAN and TabDDPM
To test TVAE, CTGAN and TabDDPM we used synthcity package and adapted officially provided examples. We calculated $\alpha$-Precision and $\beta$-Recall by using eval statistical from synthcity.metrics.
mamba env create -f synthcity.yaml
cd FOFdata
python main.py --tvae
python main.py --ctgan
python main.py --ddpm
TabDiff
To train and test TabDiff we followed the guidelines. The example we used for the AGORA50 dataset is below
git clone https://github.com/MinkaiXu/TabDiff
mamba env create -f tabdiff.yaml
cd data
mkdir GenAGORA50
python process_dataset.py --dataname GenAGORA50
python main.py --dataname GenAGORA50 --mode train --no_wandb --non_learnable_schedule --exp_name GenAGORA50
Alternative way is to skip preprocessing by downloading files from here.
To evaluate and calculate metrics
mamba env create -f synthcity.yaml
cd Info
cp info.json
python main.py --dataname GenAGORA50 --mode test --report --no_wandb
License
FriendOrFoe is under the Apache 2.0 license for code found on the associated GitHub repo and for the data hosted on HuggingFace. The LICENSE file for the repo can be found in the top-level directory.
Citation Information
If you find this repository usefull please cite the following papers
@article{Solowiej-Wedderburn2025-ar,
title = "Competition and cooperation: The plasticity of bacterial
interactions across environments",
author = "Solowiej-Wedderburn, Josephine and Pentz, Jennifer T and Lizana,
Ludvig and Schroeder, Bjoern O and Lind, Peter A and Libby, Eric",
journal = "PLoS Comput. Biol.",
publisher = "Public Library of Science (PLoS)",
volume = 21,
number = 7,
pages = "e1013213",
month = jul,
year = 2025,
copyright = "http://creativecommons.org/licenses/by/4.0/",
language = "en"
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file friend_or_foe-1.0.1.tar.gz.
File metadata
- Download URL: friend_or_foe-1.0.1.tar.gz
- Upload date:
- Size: 49.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b68e747175b0326c64648beea4fba6d89f80d19c282e228f3380714a237ca3ba
|
|
| MD5 |
34008a3580fd0e1104a2d8273955b5d2
|
|
| BLAKE2b-256 |
6d471a70c6cdf805857eacfb4fcc6637d4f526c234e9006b393f0c8da29bfe61
|
File details
Details for the file friend_or_foe-1.0.1-py3-none-any.whl.
File metadata
- Download URL: friend_or_foe-1.0.1-py3-none-any.whl
- Upload date:
- Size: 48.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ff5f71692a8a269fdad05634cbbb2e80f35f74f3be43252911ce8cc4e665394
|
|
| MD5 |
0ea21a97d518384e6af8e5064fcc6d96
|
|
| BLAKE2b-256 |
fde39ad35ea16d41a127b5e4c5c4294dbb5be86169a8cb9d2744b81957375a57
|