Installable matchclot package.
Project description
MatchCLOT
Matching single cells across modalities with contrastive learning and optimal transport
Required packages
- Python (tested with 3.8)
- Packages in requirements.txt (tested with Virtualenv and the exact versions listed there)
Dataset
If not already downloaded:
- Install aws CLI, requires Python 3.8+ https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html
cd /path/to/MatchCLOT
- Download the 7.9 GiB dataset in the
datasets
folder:aws s3 sync s3://openproblems-bio/public/phase2-private-data/match_modality/ ./datasets/ --no-sign-request
, this folder contains the phase 2 training data and the private test set data (with ground truth)
Training
- activate the virtual environment with the packages from requirements.txt
cd /path/to/MatchCLOT
Run the following command from the matchclot folder to train the model with default parameters:
- GEX2ADT
python train/train.py --VALID_FOLD=0 GEX2ADT
python train/train.py --VALID_FOLD=1 GEX2ADT
python train/train.py --VALID_FOLD=2 GEX2ADT
python train/train.py --VALID_FOLD=3 GEX2ADT
python train/train.py --VALID_FOLD=4 GEX2ADT
python train/train.py --VALID_FOLD=5 GEX2ADT
python train/train.py --VALID_FOLD=6 GEX2ADT
python train/train.py --VALID_FOLD=7 GEX2ADT
python train/train.py --VALID_FOLD=8 GEX2ADT
- GEX2ATAC
python train/train.py --VALID_FOLD=0 GEX2ATAC
Evaluation
- GEX2ADT
python run/run.py --OUT="default" GEX2ADT
- GEX2ATAC
python run/run.py --OUT="default" GEX2ATAC
Inference with pretrained models
- Download the pretrained models from here
- Unzip the downloaded files in the matchclot folder
- Run the following command from the matchclot folder:
python run/run.py --OUT="pbmc1" --B=False --T=False --HA=False --P=pretrainNoHA --CUSTOM_DATASET_PATH=datasets/PBMC/glue_processed/ GEX2ATAC
For example, this command will run the pretrained model on the dataset in datasets/PBMC/glue_processed/
.
The dataset should be composed of 2 files: test_mod1.h5ad
and test_mod2.h5ad
,
where test_mod1 is the GEX dataset and test_mod2 is the ATAC or ADT dataset.
The --B=False
flag disables the batch label matching and is used when the dataset does not have batch labels or is composed of a single batch.
The --T=False
flag disables the transductive preprocessing steps and is used when testing on a dataset not available during training.
The --HA=False
flag disables the Harmony batch effect correction step and is used when --B=False
.
The --P=pretrainNoHA
flag specifies the pretrained model to use. The --CUSTOM_DATASET_PATH=datasets/PBMC/glue_processed/
flag specifies the dataset to use. The GEX2ATAC
flag specifies the task to run.
Ablation study
No improved hyperparameters
-
GEX2ADT
- Train
python train/train.py --P=pretrainNoHY --HY=False --V=0 GEX2ADT python train/train.py --P=pretrainNoHY --HY=False --V=1 GEX2ADT python train/train.py --P=pretrainNoHY --HY=False --V=2 GEX2ADT python train/train.py --P=pretrainNoHY --HY=False --V=3 GEX2ADT python train/train.py --P=pretrainNoHY --HY=False --V=4 GEX2ADT python train/train.py --P=pretrainNoHY --HY=False --V=5 GEX2ADT python train/train.py --P=pretrainNoHY --HY=False --V=6 GEX2ADT python train/train.py --P=pretrainNoHY --HY=False --V=7 GEX2ADT python train/train.py --P=pretrainNoHY --HY=False --V=8 GEX2ADT
- Evaluate
python run/run.py --OUT="NoHY" --P=pretrainNoHY --HY=False GEX2ADT
- Train
-
GEX2ATAC
- Train
python train/train.py --P=pretrainNoHY --HY=False --V=0 GEX2ATAC
- Evaluate
python run/run.py --OUT="NoHY" --P=pretrainNoHY --HY=False GEX2ATAC
- Train
No OT matching
Does not require retraining
-
GEX2ADT
python run/run.py --OUT="NoOT" --P=pretrain --OT_M=False GEX2ADT
-
GEX2ATAC
python run/run.py --OUT="NoOT" --P=pretrain --OT_M=False GEX2ATAC
No batch label matching
Does not require retraining
-
GEX2ADT
python run/run.py --OUT="NoB" --P=pretrain --B=False GEX2ADT
-
GEX2ATAC
python run/run.py --OUT="NoB" --P=pretrain --B=False GEX2ATAC
No entropic regularization for OT matching
Does not require retraining
-
GEX2ADT
python run/run.py --OUT="NoE" --P=pretrain --OT_E=0.0 GEX2ADT
-
GEX2ATAC
python run/run.py --OUT="NoE" --P=pretrain --OT_E=0.0 GEX2ATAC
No transductive preprocessing
-
GEX2ADT
-
Train
python train/train.py --P=pretrainNoT --T=False --V=0 GEX2ADT python train/train.py --P=pretrainNoT --T=False --V=1 GEX2ADT python train/train.py --P=pretrainNoT --T=False --V=2 GEX2ADT python train/train.py --P=pretrainNoT --T=False --V=3 GEX2ADT python train/train.py --P=pretrainNoT --T=False --V=4 GEX2ADT python train/train.py --P=pretrainNoT --T=False --V=5 GEX2ADT python train/train.py --P=pretrainNoT --T=False --V=6 GEX2ADT python train/train.py --P=pretrainNoT --T=False --V=7 GEX2ADT python train/train.py --P=pretrainNoT --T=False --V=8 GEX2ADT
-
Evaluate
python run/run.py --OUT="NoT" --P=pretrainNoT --T=False GEX2ADT
-
-
GEX2ATAC
-
Train
python train/train.py --P=pretrainNoT --T=False --V=0 GEX2ATAC
-
Evaluate
python run/run.py --OUT="NoT" --P=pretrainNoT --T=False GEX2ATAC
-
No Harmony preprocessing
-
GEX2ADT
-
Train
python train/train.py --P=pretrainNoHA --HA=False --V=0 GEX2ADT python train/train.py --P=pretrainNoHA --HA=False --V=1 GEX2ADT python train/train.py --P=pretrainNoHA --HA=False --V=2 GEX2ADT python train/train.py --P=pretrainNoHA --HA=False --V=3 GEX2ADT python train/train.py --P=pretrainNoHA --HA=False --V=4 GEX2ADT python train/train.py --P=pretrainNoHA --HA=False --V=5 GEX2ADT python train/train.py --P=pretrainNoHA --HA=False --V=6 GEX2ADT python train/train.py --P=pretrainNoHA --HA=False --V=7 GEX2ADT python train/train.py --P=pretrainNoHA --HA=False --V=8 GEX2ADT
-
Evaluate
python run/run.py --OUT="NoHA" --P=pretrainNoHA --HA=False GEX2ADT
-
-
GEX2ATAC
-
Train
python train/train.py --P=pretrainNoHA --HA=False --V=0 GEX2ATAC
-
Evaluate
python run/run.py --OUT="NoHA" --P=pretrainNoHA --HA=False GEX2ATAC
-
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for matchclot-0.1.0.dev1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9cd47e3b2eba7d0a104e07be862bcc437af48ca7e479ae6ffd520deaf41ed97d |
|
MD5 | 10670b2175133aebe2a69a01ae7d2d84 |
|
BLAKE2b-256 | 27860ecc15579c96f0f7407864455077062cd5f73eb8619018cecec89d9f0462 |