Skip to main content

Model architecture exploration for cryoET particle picking

Project description

OCTOPI 🐙🐙🐙

Object deteCTion Of ProteIns. A deep learning framework for Cryo-ET 3D particle picking with autonomous model exploration capabilities.

🚀 Introduction

octopi addresses a critical bottleneck in cryo-electron tomography (cryo-ET) research: the efficient identification and extraction of proteins within complex cellular environments. As advances in cryo-ET enable the collection of thousands of tomograms, the need for automated, accurate particle picking has become increasingly urgent.

Our deep learning-based pipeline streamlines the training and execution of 3D autoencoder models specifically designed for cryo-ET particle picking. Built on copick, a storage-agnostic API, octopi seamlessly accesses tomograms and segmentations across local and remote environments.

🧩 Features

octopi offers a modular, deep learning-driven pipeline for:

  • Training and evaluating custom 3D U-Net models for particle segmentation.
  • Automatically exploring model architectures using Bayesian optimization via Optuna.
  • Performing inference for both semantic segmentation and particle localization.

octopi empowers researchers to navigate the dense, intricate landscapes of cryo-ET datasets with unprecedented precision and efficiency without manual trial and error.

Getting Started

Installation

Octopi is available on PyPI.

pip install octopi

📚 Usage

octopi provides a clean, scriptable command-line interface. Run the following command to view all available subcommands:

octopi --help

Each subcommand supports its own --help flag for detailed usage. To see practical examples of how to interface directly with the octopi API, explore the notebooks/ folder.

If you're running octopi on an HPC cluster, several SLURM-compatible submission commands are available. You can view them by running:

octopi-slurm --help

This provides utilities for submitting training, inference, and localization jobs in SLURM-based environments.

📥 Data Import & Preprocessing

To train or run inference with octopi, your tomograms must be organized inside a CoPick project. octopi supports two primary methods for data ingestion, both of which include optional Fourier cropping to reduce resolution and accelerate downstream processing.

If your tomograms are already processed and stored locally in .mrc format (e.g., from Warp, IMOD, or AreTomo), you can import them into a new or existing CoPick project using:

octopi import-mrc-volumes \
    --input-folder /path/to/mrc/files --config /path/to/config.json \
    --target-tomo-type denoised --input-voxel-size --output-voxel-size 10

octopi also can process tomograms that are hosted on the data portal. Users can download tomograms onto their own remote machine especially if they would like to downsample the tomograms to a lower resolution for speed and memory. You can download and process the tomograms using:

octopi download-dataportal \
    --config /path/to/config.json --datasetID 10445 --overlay-path path/to/saved/zarrs \
    --input-voxel-size 5 --output-voxel-size 10 \
    --dataportal-name wbp --target-tomotype wbp 

📁 Training Labels Preparation

Use octopi create-targets to create semantic masks for proteins of interest using annotation metadata. In this example lets generate picks segmentations for dataset 10439 from the CZ cryoET Dataportal (only need to run this step once).

octopi create-targets \
    --config config.json \
    --target apoferritin --target beta-galactosidase,slabpick,1 \
    --target ribosome,pytom,0 --target virus-like-particle,pytom,0 \
    --seg-target membrane \
    --tomo-alg wbp --voxel-size 10 \
    --target-session-id 1 --target-segmentation-name remotetargets \
    --target-user-id train-octopi

🧠 Training a single 3D U-Net model

Train a 3D U-Net model on the prepared datasets using the prepared target segmentations. We can use tomograms derived from multiple copick projects.

octopi train-model \
    --config experiment,config1.json \
    --config simulation,config2.json \
    --voxel-size 10 --tomo-alg wbp --Nclass 8 \
    --tomo-batch-size 50 --num-epochs 100 --val-interval 10 \
    --target-info remotetargets,train-octopi,1

Outputs will include model weights (.pth), logs, and training metrics.

🔍 Model exploration with Optuna

octopi🐙 supports automatic neural architecture search using Optuna, enabling efficient discovery of optimal 3D U-Net configurations through Bayesian optimization. This allows users to maximize segmentation accuracy without manual tuning.

To launch a model exploration job:

octopi model-explore \
    --config experiment,/mnt/dataportal/ml_challenge/config.json \
    --config simulation,/mnt/dataportal/synthetic_ml_challenge/config.json \
    --voxel-size 10 --tomo-alg wbp --Nclass 8 \
    --model-save-path train_results

Each trial evaluates a different architecture and logs: • Segmentation performance metrics • Model weights and configs • Training curves and validation loss

🔬 Trials are automatically tracked with MLflow and saved under the specified --model-save-path.

Optuna Dashboard

To quickly asses the exploration results and observe which trials results the best architectures, Optuna provides a dashboard that summarizes all the information on a dashboard. The instrucutions to access the dashboard are available here - https://optuna-dashboard.readthedocs.io/en/latest/getting-started.html, it is recommended to use either VS-Code extension or CLI.

📊 MLflow experiment tracking

To use CZI cloud MLflow tracker, add a .env in the root directory like below. You can get a CZI MLflow access token from here (note that a new token will be generated everytime you open this site).

MLFLOW_TRACKING_USERNAME = <Your_CZ_email>
MLFLOW_TRACKING_PASSWORD = <Your_mlflow_access_token>

octopi supports MLflow for logging and visualizing model training and hyperparameter search results, including: • Training loss/validation metrics over time • Model hyperparameters and architecture details • Trial comparison (e.g., best performing model)

You can use either a local MLflow instance, a remote (HPC) instance, or the CZI cloud server:

🧪 Local MLflow Dashboard

To inspect results locally: mlflow ui and open http://localhost:5000 in your browser.

🖥️ HPC Cluster MLflow Access (Remote via SSH tunnel)

If running octopi on a remote cluster (e.g., Biohub Bruno), forward the MLflow port. On your local machine: ssh -L 5000:localhost:5000 remote_username@remote_host (in the case of Bruno the remote would be login01.czbiohub.org).

Then on the remote terminal (login node): mlflow ui --host 0.0.0.0 --port 5000 to launch the MLFlow dashboard on a local borwser.

☁️ CZI coreweave cluser

For the CZI coreweave cluser, MLflow is already hosted. Go to the CZI mlflow server.

🔐 A .env file is required to authenticate (see Getting Started section). 📁 Be sure to register your project name in MLflow before launching runs.

🔮 Segmentation

Generate segmentation prediction masks for tomograms in a given copick project.

octopi inference \
    --config config.json \
    --seg-info predict,unet,1 \
    --model-config train_results/best_model_config.yaml \
    --model-weights train_results/best_model.pth \
    --voxel-size 10 --tomo-alg wbp --tomo-batch-size 25

Output masks will be saved to the corresponding copick project under the seg-info input.

📍 Localization

Convert the segmentation masks into particle coordinates.

octopi localize \
    --config config.json \
    --pick-session-id 1 --pick-user-id unet \
    --seg-info predict,unet,1

Contributing

This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to opensource@chanzuckerberg.com.

Reporting Security Issues

Please note: If you believe you have found a security issue, please responsibly disclose by contacting us at security@chanzuckerberg.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

octopi-1.0.tar.gz (78.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

octopi-1.0-py3-none-any.whl (105.0 kB view details)

Uploaded Python 3

File details

Details for the file octopi-1.0.tar.gz.

File metadata

  • Download URL: octopi-1.0.tar.gz
  • Upload date:
  • Size: 78.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for octopi-1.0.tar.gz
Algorithm Hash digest
SHA256 43fec2359b14fb7451a0bc91152cba1801a573323e6962af14a4490ee773d4b0
MD5 07e8c9406ee4d3a6f44d19be74d5bcfb
BLAKE2b-256 f7a42d6b539dab025f20e29b90b5c8dc5f1b9255e9d0d6bfb35921ff8179efc5

See more details on using hashes here.

File details

Details for the file octopi-1.0-py3-none-any.whl.

File metadata

  • Download URL: octopi-1.0-py3-none-any.whl
  • Upload date:
  • Size: 105.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for octopi-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 08dba5ae816ea9a3833da8487b693ce34c459f482911f0f0a7c990580e5865ac
MD5 dd9bc42fbf4467add975c10a8cbf570d
BLAKE2b-256 8b81dbaa8d01492cd07f22343a1c9a2795edfd3e57a8a7f88aae40c88a28ad72

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page