Skip to main content

A collection of microbial datasets obtained from metabolic modeling for machine learning research

Project description

Welcome to the Friend or Foe repository!

HuggingFace bioRxiv aRxiv

Logo

FriendOrFoe is a collection of environmental datasets obtained from metabolic modeling of microbial communities AGORA and CARVEME. FriendOrFoe gathers 64 tabular datasets (16 for AGORA with 100 additional compounds, 16 for AGORA with 50 additional compounds, 16 for CARVEME with 100 additional compounds, 16 for CARVEME with 50 additional compounds), which were constructed by studying more than 10 000 pairs of microbes via Flux Balance Analysis. Our collection could be investigated by four machine learning frameworks. The code underlying the metabolic modeling process is available here. Running Matlab code requires Gurobi Academic License. Logo

Repository structure

  • examples: provides notebooks with examples on various tasks
  • exp: stores .json files with final metrics
  • models: contains codes, environments and .json files for the experiments
  • friend_or_foe: contains source code

Getting started

Download the package via PyPi manager and install catboost

pip install friend_or_foe
pip install catboost

Basic example of loading through the package and listing all the datasets from the compendium

# import FriendOrFoe loader
from friend_or_foe.data.loader import FriendOrFoeDataLoader

# create FriendOrFoe loader
loader = FriendOrFoeDataLoader(verbose=True)

# print available datasets
datasets = loader.list_available_datasets()
for name in list(datasets.keys()): 
    print(f"  --- {name}")

Loading a specific dataset $$\texttt{BC-I}$$ from the AGORA collection

from friend_or_foe.data.loader import FriendOrFoeDataLoader
loader = FriendOrFoeDataLoader()
data = loader.load_dataset('Classification', 'AGORA', '100', 'BC-I')

Training a TabM model

from friend_or_foe.model.base import TabMModel
model = TabMModel(max_epochs=2, patience=1, batch_size=64, k=4, d_block=32)
task_type = 'classification'
model.fit(
    data['X_train'], 
    data['y_train'], 
    data['X_val'], 
    data['y_val'], 
    task_type=task_type
)

We also provide an example notebook with basic cli for comprehensive analyses.

Alternatively, you may download the data directly through Hugging Face hub loader. Download the data from our Hugging Face repo: https://huggingface.co/datasets/powidla/Friend-Or-Foe

from huggingface_hub import hf_hub_download
import pandas as pd

REPO_ID = "powidla/Friend-Or-Foe"

# File paths within the repo
X_train_ID = "Classification/AGORA/100/BC-I/X_train_BC-I-100.csv"
X_val_ID = "Classification/AGORA/100/BC-I/X_val_BC-I-100.csv"
X_test_ID = "Classification/AGORA/100/BC-I/X_test_BC-I-100.csv"

y_train_ID = "Classification/AGORA/100/BC-I/y_train_BC-I-100.csv"
y_val_ID = "Classification/AGORA/100/BC-I/y_val_BC-I-100.csv"
y_test_ID = "Classification/AGORA/100/BC-I/y_test_BC-I-100.csv"

# Download and load CSVs as pandas DataFrames
X_train = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=X_train_ID, repo_type="dataset"))
X_val = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=X_val_ID, repo_type="dataset"))
X_test = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=X_test_ID, repo_type="dataset"))

y_train = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=y_train_ID, repo_type="dataset"))
y_val = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=y_val_ID, repo_type="dataset"))
y_test = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=y_test_ID, repo_type="dataset"))

Baseline Demo Notebooks

Quickstart notebook

We provide an end-to-end example on how to predict competitive and cooperative interactions with TabNet.

Examples

The notebooks contain a simple example of using baseline models for predicting microbial interactions.

Reproducing the results

To execute the lines below for Supervised models data path should be organized as follows

FOFdata/<Task>/<Collection>/<Group>/<Dataset>/csv/<name>.csv

For example,

FOFdata/Regression/CARVEME/50/GR-III/csv/X_train_GR-III.csv

Scripts below assume that after creating FOFdata folder the above structure holds.

Supervised models

TabM

To train and test TabM we followed an example. We donwloaded the data into FOFdata folder.

mamba env create -f tabm.yaml
mkdir FOFdata
python main.py 

FT-Transformer

To train and test FT-Transformer we followed an example.

mamba env create -f ft.yaml
mkdir FOFdata
python main.py 

TabNet

To train and test TabNet we followed instructions from the package.

mamba env create -f tabnet.yaml
mkdir FOFdata
python main.py 

GBDTs

We evaluate XGBoost, LightGBM and Catboost as our baselines here.

mamba env create -f gbdts.yaml
mkdir FOFdata
python main.py 

Unsupervised models

mamba env create -f uns.yaml
mkdir FOFdata
python main.py 

Generative models

TVAE, CTGAN and TabDDPM

To test TVAE, CTGAN and TabDDPM we used synthcity package and adapted officially provided examples. We calculated $\alpha$-Precision and $\beta$-Recall by using eval statistical from synthcity.metrics.

mamba env create -f synthcity.yaml
cd FOFdata
python main.py --tvae
python main.py --ctgan
python main.py --ddpm

TabDiff

To train and test TabDiff we followed the guidelines. The example we used for the AGORA50 dataset is below

git clone https://github.com/MinkaiXu/TabDiff
mamba env create -f tabdiff.yaml
cd data
mkdir GenAGORA50
python process_dataset.py --dataname GenAGORA50
python main.py --dataname GenAGORA50 --mode train --no_wandb --non_learnable_schedule --exp_name GenAGORA50

Alternative way is to skip preprocessing by downloading files from here.

To evaluate and calculate metrics

mamba env create -f synthcity.yaml
cd Info
cp info.json
python main.py --dataname GenAGORA50 --mode test --report --no_wandb

License

FriendOrFoe is under the Apache 2.0 license for code found on the associated GitHub repo and CC-BY-4.0 for the dataset hosted on HuggingFace. The LICENSE file for the repo can be found in the top-level directory.

Citation Information

If you find this repository usefull for your research, please cite the following papers

@article{Solowiej-Wedderburn2025-ar,
  title     = "Competition and cooperation: The plasticity of bacterial
               interactions across environments",
  author    = "Solowiej-Wedderburn, Josephine and Pentz, Jennifer T and Lizana,
               Ludvig and Schroeder, Bjoern O and Lind, Peter A and Libby, Eric",
  journal   = "PLoS Comput. Biol.",
  publisher = "Public Library of Science (PLoS)",
  volume    =  21,
  number    =  7,
  pages     = "e1013213",
  month     =  jul,
  year      =  2025,
  copyright = "http://creativecommons.org/licenses/by/4.0/",
  language  = "en"
}

@misc{cherednichenko2025friendfoe,
  title={Friend or Foe}, 
  author={Oleksandr Cherednichenko and Josephine Solowiej-Wedderburn and Laura M. Carroll and Eric Libby},
  year={2025},
  eprint={2509.00123},
  archivePrefix={arXiv},
  primaryClass={q-bio.QM},
  url={https://arxiv.org/abs/2509.00123}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

friend_or_foe-0.0.3.tar.gz (53.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

friend_or_foe-0.0.3-py3-none-any.whl (51.1 kB view details)

Uploaded Python 3

File details

Details for the file friend_or_foe-0.0.3.tar.gz.

File metadata

  • Download URL: friend_or_foe-0.0.3.tar.gz
  • Upload date:
  • Size: 53.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for friend_or_foe-0.0.3.tar.gz
Algorithm Hash digest
SHA256 d866b7479c082c31a70d7996062accf6788e57b7a3caaa8eea364a9ae246e0d7
MD5 7f9e2ecd8116c636096ab6d657f79bcf
BLAKE2b-256 5cd7ab254db11cb925c8e978acab38460c27b219110a9d868c1475c3a6e6e5cf

See more details on using hashes here.

File details

Details for the file friend_or_foe-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: friend_or_foe-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 51.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for friend_or_foe-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5a934717f131ead78165d1880bade751515220c141972d84bba2bc192d3718b3
MD5 1973a176549e148801c1f03a66448189
BLAKE2b-256 a1c83d85966b79139e122612d204e84820fe184fc0565f1bcb5630e021d9e4d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page