Skip to main content

3D shape analysis using deep learning

Project description

Project Status: Active – The project has reached a stable, usable state and is being actively developed. Python Version PyPI Downloads Wheel Development Status Tests Coverage Status Code style: black

Cellshape logo by Matt De Vries

3D single-cell shape analysis of cancer cells using geometric deep learning

This is a Python package for 3D cell shape features and classes using deep learning. Please refer to our preprint on bioRxiv here.

cellshape is the main package which imports from sub-packages:

  • cellshape-helper: Facilitates point cloud generation from 3D binary masks.
  • cellshape-cloud: Implementations of graph-based autoencoders for shape representation learning on point cloud input data.
  • cellshape-voxel: Implementations of 3D convolutional autoencoders for shape representation learning on voxel input data.
  • cellshape-cluster: Implementation of deep embedded clustering to add to autoencoder models.

Installation and requirements

Dependencies

The software requires Python 3.7 or greater, PyTorch, pyntcloud, numpy, scikit-learn, tensorboard, tqdm (The full list is shown in the setup.py file). This repo makes extensive use of cellshape-cloud, cellshape-cluster, cellshape-helper, and cellshape-voxel. To reproduce our results in our paper, only cellshape-cloud, cellshape-cluster are needed.

To install

  1. We recommend creating a new conda environment. In the terminal, run:
conda create --name cellshape-env python=3.8 -y
conda activate cellshape-env
pip install --upgrade pip
  1. Install cellshape from pip
pip install cellshape

This should take ~5mins.

Hardware requirements

We have tested this software on an Ubuntu 20.04LTS and 18.04LTS with 128Gb RAM and NVIDIA Quadro RTX 6000 GPU.

Data availability and structure

Data availability

Datasets to reproduce our results in our paper are available here.

  • SamplePointCloudData.zip contains a sample dataset of a point cloud of cells in order to test our code.
  • FullData.zip contains 3 plates of point cloud representations of cells for several treatments. This data can be used to reproduce our results.
  • Output.zip contains trained model weights and deep learning cell geometric features extracted using these trained models.
  • BinaryCallMasks.zip contains a sample set of binary masks of cells which can be used as input to cellshape-helper to test our point cloud generation code.

Data structure

We suggest testing our code on the data contained in SamplePointCloudData.zip. This data is structured in the following way:

cellshapeSamplePointCloudDatset/
    small_data.csv
    Plate1/
        stacked_pointcloud/
            Binimetinib/
                0010_0120_accelerator_20210315_bakal01_erk_main_21-03-15_12-37-27.ply
                ...
            Blebbistatin/
            ...
    Plate2/
        stacked_pointcloud/
    Plate3/
        stacked_pointcloud/

This data structure is only necessary if wanting to use our data. If you would like to use your own dataset, you may structure it in any way as long as the extension of the point clouds are .ply. If using your own data structure, please use the define the parameter --dataset_type as "Other".

Usage

The following steps assume that one already has point cloud representations of cells or nuclei. If you need to generate point clouds from 3D binary masks please go to cellshape-helper.

Downloading the dataset

We suggest testing our code on the data contained in SamplePointCloudData.zip. Please download the data and unzip the contents into a directory of your choice. We recommend doing this in your ~Documents/ folder. This is used as parameters in the steps below so please remember this path. Downloading and unzipping the data can be done in the terminal:

  1. Download the data into the ~/Documents/ folder with wget
cd ~/Documents
wget https://sandbox.zenodo.org/record/1080300/files/SamplePointCloudDataset.zip
  1. Unzip the data with unzip:
unzip SamplePointCloudDataset.zip

This will create a directory called cellshapeSamplePointCloudDatset under your ~Documents/ folder, i.e. /home/USER/Documents/cellshapeSamplePointCloudDatset/ (USER will be different for you).

Training

The training procedure follows two steps:

  1. Training the dynamic graph convolutional foldingnet (DFN) autoencoder to automatically learn shape features.
  2. Adding the clustering layer to refine shape features and learn shape classes simultaneously.

Inference can be done after each step.

Our training functions are run through a command line interface with the command cellshape-train. For help on all command line options run the following in the terminal:

cellshape-train -h

1. Train DFN autoencoder

The first step trains the autoencoder without the additional clustering layer. Run the following in the terminal. Remember to change the --cloud_dataset_path, --dataframe_path, and --output_dir parmaeters to be specific to your directories. Usually, this would require only changing the word USER in these paths.

cellshape-train \
--model_type "cloud" \
--pretrain "True" \
--train_type "pretrain" \
--cloud_dataset_path "/home/USER/Documents/cellshapeSamplePointCloudDataset/" \
--dataset_type "SingleCell" \
--dataframe_path "/home/USER/Documents/cellshapeSamplePointCloudDataset/small_data.csv" \
--output_dir "/home/USER/Documents/cellshapeOutput/" \
--num_epochs_autoencoder 250 \
--encoder_type "dgcnn" \
--decoder_type "foldingnetbasic" \
--num_features 128 \

This step will create an output directory /home/USER/Documents/cellshapeOutput/ with the subfolders: nets, reports, and runs which contain the model weights, logged outputs, and tensorboard runs, respectively, for each experiment. Each experiment is named with the following convention {encoder_type}_{decoder_type}_{num_features}_{train_type}_{xxx}, where {xxx} is a counter. For example, if this was the first experiment you have run, the trained model weights will be saved to: /home/USER/Documents/cellshapeOutput/nets/dgcnn_foldingnetbasic_128_pretrained_001.pt. This path will be used in the next step for the --pretrained-path parameter.

2. Add clustering layer to refine shape features and learn shape classes simultaneously

The next step is to add the clustering layer to refine the model weights. As before, run the following in the terminal. Remember to change the --cloud_dataset_path, --dataframe_path, --output_dir, and --pretrained-path parmaeters to be specific to your directories. Usually, this would require only changing the word USER in these paths.

cellshape-train \
--model_type "cloud" \
--train_type "DEC" \
--pretrain False \
--cloud_dataset_path "/home/USER/Documents/cellshapeSamplePointCloudDataset/" \
--dataset_type "SingleCell" \
--dataframe_path "/home/USER/Documents/cellshapeSamplePointCloudDataset/small_data.csv" \
--output_dir "/home/USER/Documents/cellshapeOutput/" \
--num_features 128 \
--num_clusters 5 \
--pretrained_path "/home/USER/Documents/cellshapeOutput/nets/dgcnn_foldingnetbasic_128_pretrained_001.pt" \

To monitor the training using Tensorboard, in the terminal run:

pip install tensorboard
tensorboard --logdir "/home/USER/Documents/cellshapeOutput/runs/"

Alternatively, the training steps can be run sequentially through one command line

This would be to state that you would like to pretrain and that you want to train DEC.

cellshape-train \
--model_type "cloud" \
--train_type "DEC" \
--pretrain True \
--cloud_dataset_path "/home/USER/Documents/cellshapeSamplePointCloudDataset/" \
--dataset_type "SingleCell" \
--dataframe_path "/home/USER/Documents/cellshapeSamplePointCloudDataset/small_data.csv" \
--output_dir "/home/USER/Documents/cellshapeOutput/" \
--num_features 128 \
--num_clusters 5 \

Inference

Example inference notebooks can be found in the docs/notebooks/ folder.

For developers

  • Fork the repository
  • Clone your fork
git clone https://github.com/USERNAME/cellshape
  • Install an editable version (-e) with the development requirements (dev)
cd cellshape
pip install -e .[dev] 
  • To install pre-commit hooks to ensure formatting is correct:
pre-commit install
  • To release a new version:

Firstly, update the version with bump2version (bump2version patch, bump2version minor or bump2version major). This will increment the package version (to a release candidate - e.g. 0.0.1rc0) and tag the commit. Push this tag to GitHub to run the deployment workflow:

git push --follow-tags

Once the release candidate has been tested, the release version can be created with:

bump2version release

References

[1] An Tao, 'Unsupervised Point Cloud Reconstruction for Classific Feature Learning', GitHub Repo, 2020

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cellshape-0.0.19.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

cellshape-0.0.19-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file cellshape-0.0.19.tar.gz.

File metadata

  • Download URL: cellshape-0.0.19.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for cellshape-0.0.19.tar.gz
Algorithm Hash digest
SHA256 2286860deb55cf834fc40cf17f7b76320622ada7d5d62a70dff1de9298df1aeb
MD5 e1257df40fd17776c749a0cfb60069d8
BLAKE2b-256 38228622b08f79cc85d366c8d405ea5139664adae255eff1d71e2103035cd937

See more details on using hashes here.

File details

Details for the file cellshape-0.0.19-py3-none-any.whl.

File metadata

  • Download URL: cellshape-0.0.19-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.5

File hashes

Hashes for cellshape-0.0.19-py3-none-any.whl
Algorithm Hash digest
SHA256 2e43487b5297377f635d870179a4991118bece312dadd1339f1f3dcfb945006c
MD5 ca0dabebb769ce87f418db6b51880c4b
BLAKE2b-256 7eb93f669f89a43ba8091c01d07874b72f509904ee8c67e6799b156bc55c98ef

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page