Skip to main content

Collection of representation learning models, techniques, callbacks, utils, used to create latent variable models of cell shape, morphology and intracellular organization.

Project description

CytoDL

PyTorch Lightning Config: Hydra Template

Description

As part of the Allen Institute for Cell Science's mission to understand the principles by which human induced pluripotent stem cells establish and maintain robust dynamic localization of cellular structure, CytoDL aims to unify deep learning approaches for understanding 2D and 3D biological data as images, point clouds, and tabular data.

The bulk of CytoDL's underlying structure bases the lightning-hydra-template organization - we highly recommend that you familiarize yourself with their (short) docs for detailed instructions on running training, overrides, etc.

Our currently available code is roughly split into two domains: image-to-image transformations and representation learning. The image-to-image code (denoted im2im) contains configuration files detailing how to train and predict using models for resolution enhancement using conditional GANs (e.g. predicting 100x images from 20x images), semantic and instance segmentation, and label-free prediction. We also provide configs for Masked Autoencoder (MAE) and Joint Embedding Prediction Architecture (JEPA) pretraining on 2D and 3D images using a Vision Transformer (ViT) backbone and for training segmentation decoders from these pretrained features. Representation learning code includes a wide variety of Variational Auto Encoder (VAE) architectures and contrastive learning methods such as VICReg. Due to dependency issues, equivariant autoencoders are not currently supported on Windows.

As we rely on recent versions of pytorch, users wishing to train and run models on GPU hardware will need up-to-date NVIDIA drivers. Users with older GPUs should not expect code to work out of the box. Similarly, we do not currently support training/predicting on Mac GPUs. In most cases, cpu-based training should work when GPU training fails.

For im2im models, we provide a handful of example 3D images for training the basic image-to-image tranformation-type models and default model configuration files for users to become comfortable with the framework and prepare them for training and applying these models on their own data. Note that these default models are very small and train on heavily downsampled data in order to make tests run efficiently - for best performance, the model size should be increased and downsampling removed from the data configuration.

How to run

Install dependencies. Dependencies are platform specific, please replace PLATFORM with your platform - either linux, windows, or mac

# clone project
git clone https://github.com/AllenCellModeling/cyto-dl
cd cyto-dl

# [OPTIONAL] create conda environment
conda create -n myenv python=3.9
conda activate myenv

pip install -r requirements/PLATFORM/requirements.txt

# [OPTIONAL] install extra dependencies - equivariance related
pip install -r requirements/PLATFORM/equiv-requirements.txt

pip install -e .


#[OPTIONAL] if you want to use default experiments on example data
python scripts/download_test_data.py

API

from cyto_dl.api import CytoDLModel

model = CytoDLModel()
model.download_example_data()
model.load_default_experiment("segmentation", output_dir="./output", overrides=["trainer=cpu"])
model.print_config()
model.train()

# [OPTIONAL] async training
await model.train(run_async=True)

Most models work by passing data paths in the data config. For training or predicting on datasets that are already in memory, you can pass the data directly to the model. Note that this use case is primarily for programmatic use (e.g. in a workflow or a jupyter notebook), not through the normal CLI. An experiment showing a possible config setup for this use case is demonstrated with the im2im/segmentation_array experiment. For training, data must be passed as a dictionary with keys "train" and "val" containing lists of dictionaries with keys corresponding to the data config.

from cyto_dl.api import CytoDLModel
import numpy as np

model = CytoDLModel()
model.load_default_experiment("segmentation_array", output_dir="./output")
model.print_config()

# create CZYX dummy data
data = {
    "train": [{"raw": np.random.randn(1, 40, 256, 256), "seg": np.ones((1, 40, 256, 256))}],
    "val": [{"raw": np.random.randn(1, 40, 256, 256), "seg": np.ones((1, 40, 256, 256))}],
}
model.train(data=data)

For predicting, data must be passed as a list of numpy arrays. The resulting predictions will be processed in a dictionary with one key for each task head in the model config and corresponding values in BC(Z)YX order.

from cyto_dl.api import CytoDLModel
import numpy as np
from cyto_dl.utils import extract_array_predictions

model = CytoDLModel()
model.load_default_experiment(
    "segmentation_array", output_dir="./output", overrides=["data=im2im/numpy_dataloader_predict"]
)
model.print_config()

# create CZYX dummy data
data = [np.random.rand(1, 32, 64, 64), np.random.rand(1, 32, 64, 64)]

_, _, output = model.predict(data=data)
preds = extract_array_predictions(output)

Train model with chosen experiment configuration from configs/experiment/

#gpu
python cyto_dl/train.py experiment=im2im/experiment_name.yaml trainer=gpu

#cpu
python cyto_dl/train.py experiment=im2im/experiment_name.yaml trainer=cpu

You can override any parameter from command line like this

python cyto_dl/train.py trainer.max_epochs=20 datamodule.batch_size=64

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cyto-dl-0.4.2.tar.gz (427.9 kB view details)

Uploaded Source

Built Distribution

cyto_dl-0.4.2-py3-none-any.whl (296.5 kB view details)

Uploaded Python 3

File details

Details for the file cyto-dl-0.4.2.tar.gz.

File metadata

  • Download URL: cyto-dl-0.4.2.tar.gz
  • Upload date:
  • Size: 427.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for cyto-dl-0.4.2.tar.gz
Algorithm Hash digest
SHA256 e2719275c336c4c5814078f5ba22eb2edb56d8b1bab8e25b0676fce43734a31c
MD5 aafc85ad9d953f3ea36c5794142726f1
BLAKE2b-256 00b8c30edda5e0e3efc5b2b14dc22aa46f79e8642bcb9590cd8f6cd057708e70

See more details on using hashes here.

Provenance

The following attestation bundles were made for cyto-dl-0.4.2.tar.gz:

Publisher: publish.yml on AllenCellModeling/cyto-dl

Attestations:

File details

Details for the file cyto_dl-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: cyto_dl-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 296.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for cyto_dl-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 70eaf5d3637a048c3fe658aa7182aa6b77033005c97166d7e85c19c27c71ad8c
MD5 a9f0dd8d787b2661459d856822be478a
BLAKE2b-256 77ee2b4073aa6c1eb0077f00dadac091b13f46ede4c13106b5897774a733fda9

See more details on using hashes here.

Provenance

The following attestation bundles were made for cyto_dl-0.4.2-py3-none-any.whl:

Publisher: publish.yml on AllenCellModeling/cyto-dl

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page