Skip to main content

A collection of microbial datasets obtained from metabolic modeling for machine learning research

Project description

Welcome to the Friend or Foe repository!

HuggingFace bioRxiv

Logo

FriendOrFoe is a collection of environmental datasets obtained from metabolic modeling of microbial communities AGORA and CARVEME. FriendOrFoe gathers 64 tabular datasets (16 for AGORA with 100 additional compounds, 16 for AGORA with 50 additional compounds, 16 for CARVEME with 100 additional compounds, 16 for CARVEME with 50 additional compounds), which were constructed by studying more than 10 000 pairs of microbes via Flux Balance Analysis. Our collection could be investigated by four machine learning frameworks. The code underlying the metabolic modeling process is available here. Running Matlab code requires Gurobi Academic License. Logo

Repository structure

  • examples: provides notebooks with examples on various tasks
  • exp: stores .json files with final metrics
  • models: contains codes, environments and .json files for the experiments

Getting started

Download the data from our HugginFace repo: https://huggingface.co/datasets/powidla/Friend-Or-Foe

from huggingface_hub import hf_hub_download
import pandas as pd

REPO_ID = "powidla/Friend-Or-Foe"

# File paths within the repo
X_train_ID = "Classification/AGORA/100/BC-I/X_train_BC-I-100.csv"
X_val_ID = "Classification/AGORA/100/BC-I/X_val_BC-I-100.csv"
X_test_ID = "Classification/AGORA/100/BC-I/X_test_BC-I-100.csv"

y_train_ID = "Classification/AGORA/100/BC-I/y_train_BC-I-100.csv"
y_val_ID = "Classification/AGORA/100/BC-I/y_val_BC-I-100.csv"
y_test_ID = "Classification/AGORA/100/BC-I/y_test_BC-I-100.csv"

# Download and load CSVs as pandas DataFrames
X_train = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=X_train_ID, repo_type="dataset"))
X_val = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=X_val_ID, repo_type="dataset"))
X_test = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=X_test_ID, repo_type="dataset"))

y_train = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=y_train_ID, repo_type="dataset"))
y_val = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=y_val_ID, repo_type="dataset"))
y_test = pd.read_csv(hf_hub_download(repo_id=REPO_ID, filename=y_test_ID, repo_type="dataset"))

Baseline Demo Notebooks

Quickstart notebook

We provide an end-to-end example on how to predict competitive and cooperative interactions with TabNet.

Examples

The notebooks contain a simple example of using baseline models for predicting microbial interactions.

Reproducing the results

To execute the lines below for Supervised models data path should be organized as follows

FOFdata/<Task>/<Collection>/<Group>/<Dataset>/csv/<name>.csv

For example,

FOFdata/Regression/CARVEME/50/GR-III/csv/X_train_GR-III.csv

Scripts below assume that after creating FOFdata folder the above structure holds.

Supervised models

TabM

To train and test TabM we followed an example. We donwloaded the data into FOFdata folder.

mamba env create -f tabm.yaml
mkdir FOFdata
python main.py 

FT-Transformer

To train and test FT-Transformer we followed an example.

mamba env create -f ft.yaml
mkdir FOFdata
python main.py 

TabNet

To train and test TabNet we followed instructions from the package.

mamba env create -f tabnet.yaml
mkdir FOFdata
python main.py 

GBDTs

We evaluate XGBoost, LightGBM and Catboost as our baselines here.

mamba env create -f gbdts.yaml
mkdir FOFdata
python main.py 

Unsupervised models

mamba env create -f uns.yaml
mkdir FOFdata
python main.py 

Generative models

TVAE, CTGAN and TabDDPM

To test TVAE, CTGAN and TabDDPM we used synthcity package and adapted officially provided examples. We calculated $\alpha$-Precision and $\beta$-Recall by using eval statistical from synthcity.metrics.

mamba env create -f synthcity.yaml
cd FOFdata
python main.py --tvae
python main.py --ctgan
python main.py --ddpm

TabDiff

To train and test TabDiff we followed the guidelines. The example we used for the AGORA50 dataset is below

git clone https://github.com/MinkaiXu/TabDiff
mamba env create -f tabdiff.yaml
cd data
mkdir GenAGORA50
python process_dataset.py --dataname GenAGORA50
python main.py --dataname GenAGORA50 --mode train --no_wandb --non_learnable_schedule --exp_name GenAGORA50

Alternative way is to skip preprocessing by downloading files from here.

To evaluate and calculate metrics

mamba env create -f synthcity.yaml
cd Info
cp info.json
python main.py --dataname GenAGORA50 --mode test --report --no_wandb

License

FriendOrFoe is under the Apache 2.0 license for code found on the associated GitHub repo and for the data hosted on HuggingFace. The LICENSE file for the repo can be found in the top-level directory.

Citation Information

If you find this repository usefull please cite the following papers

@article{Solowiej-Wedderburn2025-ar,
  title     = "Competition and cooperation: The plasticity of bacterial
               interactions across environments",
  author    = "Solowiej-Wedderburn, Josephine and Pentz, Jennifer T and Lizana,
               Ludvig and Schroeder, Bjoern O and Lind, Peter A and Libby, Eric",
  journal   = "PLoS Comput. Biol.",
  publisher = "Public Library of Science (PLoS)",
  volume    =  21,
  number    =  7,
  pages     = "e1013213",
  month     =  jul,
  year      =  2025,
  copyright = "http://creativecommons.org/licenses/by/4.0/",
  language  = "en"
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

friend_or_foe-1.0.1.tar.gz (49.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

friend_or_foe-1.0.1-py3-none-any.whl (48.4 kB view details)

Uploaded Python 3

File details

Details for the file friend_or_foe-1.0.1.tar.gz.

File metadata

  • Download URL: friend_or_foe-1.0.1.tar.gz
  • Upload date:
  • Size: 49.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for friend_or_foe-1.0.1.tar.gz
Algorithm Hash digest
SHA256 b68e747175b0326c64648beea4fba6d89f80d19c282e228f3380714a237ca3ba
MD5 34008a3580fd0e1104a2d8273955b5d2
BLAKE2b-256 6d471a70c6cdf805857eacfb4fcc6637d4f526c234e9006b393f0c8da29bfe61

See more details on using hashes here.

File details

Details for the file friend_or_foe-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: friend_or_foe-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 48.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for friend_or_foe-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6ff5f71692a8a269fdad05634cbbb2e80f35f74f3be43252911ce8cc4e665394
MD5 0ea21a97d518384e6af8e5064fcc6d96
BLAKE2b-256 fde39ad35ea16d41a127b5e4c5c4294dbb5be86169a8cb9d2744b81957375a57

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page