Skip to main content

Heimdall: A Comprehensive Paradigm for Evaluating Single-Cell Representations within Foundational Models

Project description

Lint

Heimdall

Installation

# Clone repository
git clone https://github.com/gkrieg/Heimdall && cd Heimdall

# Create conda env
conda create --name heimdall python=3.10 && conda activate heimdall

# Install dependencies
pip install torch==2.0.1+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

# Install Heimdall (in editable `-e` mode)
pip install -e .

Quickstart

train.py provides a clear overview of the inputs needed, how to prepare the data, model, optimizer, and run the trainer.

python train.py +experiments=cta_pancreas

Make sure to edit the global file config/global_vars.yaml based on your set up.

Sweeps

scripts/create_sweep.py has the arguments --experiment-name (the hydra experiment file name), --project-name (W&B project name), --fg and --fc which are the names of the hydra configs. It is a short script that will load in sweeps/base.yaml and updates it appropriately, and creates a sweep argument and returns it. This can work in tandem with deploy_sweep.sh to submit multiple sweeps on SLURM systems.

python scripts/create_sweep.py --experiment-name cta_pancreas --project-name Pancreas-Celltype-Classification

Dev Notes

Dev installation

pip install -r requirements.txt

Once the pre-commit command line tool is installed, every time you commit some changes, it will perform several code-style checks and automatically apply some fixes for you (if there is any issue). When auto-fixes are applied, you need to recommit those changes. Note that this process can take more than one round.

After you are done committing changes and are ready to push the commits to the remote branch, run nox to perform a final quality check. Note that nox is linting only and does not fix the issues for you. You need to address the issues manually based on the instructions provided.

Cheatsheet

# Run cell type classification dev experiment with wandb disabled
WANDB_MODE=disabled python train.py +experiments=cta_pancreas

# Run cell type classification dev experiment with wandb offline mode
WANDB_MODE=offline python train.py +experiments=cta_pancreas

# Run cell cell interaction dev experiment with wandb disabled
WANDB_MODE=disabled python train.py +experiments=cta_pancreas

# Run cell cell interaction dev experiment with wandb disabled and overwrite epochs
WANDB_MODE=disabled python train.py +experiments=cta_pancreas tasks.args.epochs=2

# Run cell cell interaction dev experiment with user profile (dev has wandb disabled by default)
python train.py +experiments=cta_pancreas user=lane-remy-dev

Nox

Run code linting and unittests:

nox

Run dev experiments test on Lane compute node with CUDA (lane-shared-dev user profile):

nox -e test_experiments

Run fast dev experiments (only selected small datasets):

nox -e test_experiments -- quick_run

Run full dev experiments (including those datasets):

nox -e test_experiments -- full_run

Run dev experiments with a different user profile:

nox -e test_experiments -- user=box-remy-dev

Local tests

We use pytest to write local tests. New test suites can be added under tests/test_{suite_name}.py.

Run a particular test suite with:

python -m pytest tests/test_{suite_name}.py

Run all tests but the integration test:

python -m pytest -m 'not integration'

Note: to run the integration test, you'll need to specify the Hydra user using a .env file. The contents of the file should be like so:

HYDRA_USER=test

Turning off caching

To turn off dataset caching for dev purposes, set cache_preprocessed_dataset_dir: null in config/global_vars.yaml. Alternatively, pass cache_preprocessed_dataset_dir=null through the command line, e.g.,

python train.py +experiments=cta_pancreas cache_preprocessed_dataset_dir=null

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sc_heimdall-0.3.0.tar.gz (31.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sc_heimdall-0.3.0-py3-none-any.whl (36.5 kB view details)

Uploaded Python 3

File details

Details for the file sc_heimdall-0.3.0.tar.gz.

File metadata

  • Download URL: sc_heimdall-0.3.0.tar.gz
  • Upload date:
  • Size: 31.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.2

File hashes

Hashes for sc_heimdall-0.3.0.tar.gz
Algorithm Hash digest
SHA256 95ac213b63fbacaf3dc75ba50f99fc1670c4569c3e3863f7a9ba516ee395ff0d
MD5 fc5adbc5fa48a3c85c9fcfdfd1abf892
BLAKE2b-256 e0afebf1b67f98bd5fa2b8cb82d16f6f4dfdfd01b705fb0ae85166df4441fd61

See more details on using hashes here.

File details

Details for the file sc_heimdall-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: sc_heimdall-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 36.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.2

File hashes

Hashes for sc_heimdall-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7b7bd04de45cea004a2914ed67a31fa3e694865316daac0e4b3a227ec02c9211
MD5 d7b3cd6b248f13b8dcfa9661a53844c9
BLAKE2b-256 d9f530471bc2f64872e77db58a63ebb0e9f032f7c68f4058cdcccdb9dfc92981

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page