Skip to main content

Local Shape Descriptors.

Project description

Local Shape Descriptors (for Neuron Segmentation)

This repository contains code to compute Local Shape Descriptors (LSDs) from an instance segmentation. LSDs can then be used during training as an auxiliary target, which we found to improve boundary prediction and therefore segmentation quality. Read more about it in our paper and/or blog post.

Paper Blog Post
Paper Blog post

Quick 2d Examples

Notebooks

Example networks & pipelines

Parallel processing


Cite:

@article{sheridan_local_2021,
	title = {Local Shape Descriptors for Neuron Segmentation},
	url = {https://www.biorxiv.org/content/10.1101/2021.01.18.427039v1},
	urldate = {2021-01-20},
	journal = {bioRxiv},
	author = {Sheridan, Arlo and Nguyen, Tri and Deb, Diptodip and Lee, Wei-Chung Allen and Saalfeld, Stephan and Turaga, Srinivas and Manor, Uri and Funke, Jan},
	year = {2021}
}

Notes:

  • Tested on Ubuntu 18.04 with Python 3.

  • This is not production level software and was developed in a pure research environment. Therefore some scripts may not work out of the box. For example, all paper networks were originally written using now deprecated tensorflow/cudnn versions and rely on an outdated singularity container. Because of this, the singularity image will not build from the current recipe - if replicating with the current implementations, please reach out for the singularity container (it is too large to upload here). Alternatively, consider reimplementing networks in pytorch (recommended - see Training).

  • Post-proccesing steps were designed for use with a specific cluster and will need to be tweaked for individual use cases. If the need / use increases then we will look into refactoring, packaging and distributing.

  • Currently, several post-processing scripts (e.g watershed) are located inside this repo which creates more dependencies than needed for using the lsds. One forseeable issue is that agglomeration requires networkx==2.2 for the MergeTree and boost is required for funlib.segment. We have restructured the repo to use lsd.train and lsd.post submodules. For just calculating the lsds, it is sufficient to use lsd.train, e.g:

from lsd.train import local_shape_descriptor

Quick 2d Examples

The following tutorial allows you to run in the browser using google colab. In order to replicate the tutorial locally, create a conda environment and install the relevant packages. E.g:

  1. conda create -n lsd_test python=3
  2. conda activate lsd_test
  3. pip install matplotlib scikit-image gunpowder
  4. pip install git+https://github.com/funkelab/lsd.git

tutorial: Open In Colab


Notebooks

  • Examble colab notebooks are located here. You can download or run below (control + click open in colab). When running a notebook, you will probably get the message: "Warning: This notebook was not authored by Google". This can be ignored, you can run anyway.

  • We uploaded ~1.7 tb of data (raw/labels/masks/rags etc.) to an s3 bucket. The following tutorial shows some examples for accessing and visualizing the data.

    • Data download: Open In Colab
  • If implementing the LSDs in your own training pipeline (i.e pure pytorch/tensorflow), calculate the LSDs on a label array of unique objects and use them as the target for your network (see quick 2d examples above for calculating).

  • The following tutorials show how to set up 2D training/prediction pipelines using Gunpowder. It is recommended to follow them in order (skip the basic tutorial if familiar with gunpowder). Note: Google Colab can sometimes be slow especially due to data I/O. These notebooks will run much faster in a jupyter notebook on a local gpu, but the Colab versions should provide a starting point.

    • Basic Gunpowder tutorial: Open In Colab

    • Train Affinities: Open In Colab

    • Train LSDs: Open In Colab

    • Train MTLSD: Open In Colab

    • Inference (using pretrained MTLSD checkpoint): Open In Colab

    • Watershed, agglomeration, segmentation: Open In Colab

  • Bonus notebooks:

    • Training using sparse ground truth (useful if you only have a subset of training data but still want dense predictions): Open In Colab

    • Ignore regions during training (useful if you want the network to learn to predict zeros in certain regions, eg glia ids): Open In Colab

    • Train lsds on non-em data with pytorch: Open In Colab


Example networks & pipelines

  • There are some example networks and training/prediction pipelines from the fib25 dataset here.

Training

  • Since networks in this paper were implemented in Tensorflow, there was a two step process for training. First the networks were created using the mknet.py files. This saved tensor placeholders and meta data in config files that were then used for both training and prediction. The mknet files used the now deprecated mala repository to create the networks. If reimplementing in Tensorflow, consider migrating to funlib.learn.tensorflow.

  • If using Pytorch, the networks can just be created directly inside the train scripts since placeholders aren't required. For example, the logic from this tensorflow mknet script and this tensorflow train script can be condensed to this pytorch train script.

  • For training an autocontext network (e.g acrlsd), the current implementation learns the LSDs in a first pass. A saved checkpoint is then used when creating the second pass in order to predict LSDs prior to learning the Affinities. One could modify this to use a single setup and remove the need for writing the LSDs to disk.

Inference

  • By default, the predict scripts (example) contain the worker logic to be distributed by the scheduler during parallel processing (see below).

  • If you just need to process a relatively small volume, it is sometimes not necessary to use blockwise processing. In this case, it is recommended to use a scan node, and specify input/output shapes + context. An example can be found in the inference colab notebook above.

  • Similar to training, the current autocontext implementations assume the predicted LSDs are written to a zarr/n5 container and then used as input to the second pass to predict affinities. This can also be changed to predict on the fly if needed.

Visualizations of example training/prediction pipelines

Vanilla affinities training:



Autocontext LSD and affinities prediction:


Parallel processing

  • If you are running on small data then this section may be irrelevant. See the Watershed, agglomeration, segmentation notebook above if you just want to get a sense of obtaining a segmentation from affinities.

  • Example processing scripts can be found here

  • We create segmentations following the approach in this paper. Generally speaking, after training a network there are five steps to obtain a segmentation:

  1. Predict boundaries (this can involve the use of LSDs as an auxiliary task)
  2. Generate supervoxels (fragments) using seeded watershed. The fragment centers of mass are stored as region adjacency graph nodes.
  3. Generate edges between nodes using hierarchical agglomeration. The edges are weighted by the underlying affinities. Edges with lower scores are merged earlier.
  4. Cut the graph at a predefined threshold and relabel connected components. Store the node - component lookup tables.
  5. Use the lookup tables to relabel supervoxels and generate a segmentation.

  • Everything was done in parallel using daisy (github, docs), but one could use multiprocessing or dask instead.

  • For our experiments we used MongoDB for all storage (block checks, rags, scores, etc) due to the size of the data. Depending on use case, it might be better to read/write to file rather than mongo. See watershed for further info.

  • The following examples were written for use with the Janelia LSF cluster and are just meant to be used as a guide. Users will likely need to customize for their own specs (for example if using a SLURM cluster).

  • Need to install funlib.segment and funlib.evaluate if using/adapting segmentation/evaluation scripts.

Inference

The worker logic is located in individual predict.py scripts (example). The master script distributes using daisy.run_blockwise. The only need for MongoDb here is for the block check function (to check which blocks have successfully completed). To remove the need for mongo, one could remove the check function (remember to also remove block_done_callback in predict.py) or replace with custom function (e.g check chunk completion directly in output container).

Example roi config
{
  "container": "hemi_roi_1.zarr",
  "offset": [140800, 205120, 198400],
  "size": [3000, 3000, 3000]
}
Example predict config
 {
  "base_dir": "/path/to/base/directory",
  "experiment": "hemi",
  "setup": "setup01",
  "iteration": 400000,
  "raw_file": "predict_roi.json",
  "raw_dataset" : "volumes/raw",
  "out_base" : "output",
  "file_name": "foo.zarr",
  "num_workers": 5,
  "db_host": "mongodb client",
  "db_name": "foo",
  "queue": "gpu_rtx",
  "singularity_image": "/path/to/singularity/image"
}

Watershed

The worker logic is located in a single script which is then distributed by the master script. By default the nodes are stored in mongo using a MongoDbGraphProvider. To write to file (i.e compressed numpy arrays), you can use the FileGraphProvider instead (inside the worker script).

Example watershed config
{
  "experiment": "hemi",
  "setup": "setup01",
  "iteration": 400000,
  "affs_file": "foo.zarr",
  "affs_dataset": "/volumes/affs",
  "fragments_file": "foo.zarr",
  "fragments_dataset": "/volumes/fragments",
  "block_size": [1000, 1000, 1000],
  "context": [248, 248, 248],
  "db_host": "mongodb client",
  "db_name": "foo",
  "num_workers": 6,
  "fragments_in_xy": false,
  "epsilon_agglomerate": 0,
  "queue": "local"
}

Agglomerate

Same as watershed. Worker script, master script. Change to FileGraphProvider if needed.

Example agglomerate config
{
  "experiment": "hemi",
  "setup": "setup01",
  "iteration": 400000,
  "affs_file": "foo.zarr",
  "affs_dataset": "/volumes/affs",
  "fragments_file": "foo.zarr",
  "fragments_dataset": "/volumes/fragments",
  "block_size": [1000, 1000, 1000],
  "context": [248, 248, 248],
  "db_host": "mongodb client",
  "db_name": "foo",
  "num_workers": 4,
  "queue": "local",
  "merge_function": "hist_quant_75"
}

Find segments

In contrast to the above three methods, when creating LUTs there just needs to be enough RAM to hold the RAG in memory. The only thing done in parallel is reading the graph (graph_provider.read_blockwise()). It could be adapted to use multiprocessing/dask for distributing the connected components for each threshold, but if the rag is too large there will be pickling errors when passing the nodes/edges. Daisy doesn't need to be used for scheduling here since nothing is written to containers.

Example find segments config
{
  "db_host": "mongodb client",
  "db_name": "foo",
  "fragments_file": "foo.zarr",
  "edges_collection": "edges_hist_quant_75",
  "thresholds_minmax": [0, 1],
  "thresholds_step": 0.02,
  "block_size": [1000, 1000, 1000],
  "num_workers": 5,
  "fragments_dataset": "/volumes/fragments",
  "run_type": "test"
}

Extract segmentation

This script does use daisy to write the segmentation to file, but doesn't necessarily require bsub/sbatch to distribute (you can run locally).

Example extract segmentation config
{
  "fragments_file": "foo.zarr",
  "fragments_dataset": "/volumes/fragments",
  "edges_collection": "edges_hist_quant_75",
  "threshold": 0.4,
  "block_size": [1000, 1000, 1000],
  "out_file": "foo.zarr",
  "out_dataset": "volumes/segmentation_40",
  "num_workers": 3,
  "run_type": "test"
}

Evaluate volumes

Evaluate Voi scores. Assumes dense voxel ground truth (not skeletons). This also assumes the ground truth (and segmentation) can fit into memory, which was fine for hemi and fib25 volumes assuming ~750 GB of RAM. The script should probably be refactored to run blockwise.

Example evaluate volumes config
{
  "experiment": "hemi",
  "setup": "setup01",
  "iteration": 400000,
  "gt_file": "hemi_roi_1.zarr",
  "gt_dataset": "volumes/labels/neuron_ids",
  "fragments_file": "foo.zarr",
  "fragments_dataset": "/volumes/fragments",
  "db_host": "mongodb client",
  "rag_db_name": "foo",
  "edges_collection": "edges_hist_quant_75",
  "scores_db_name": "scores",
  "thresholds_minmax": [0, 1],
  "thresholds_step": 0.02,
  "num_workers": 4,
  "method": "vanilla",
  "run_type": "test"
}

Evaluate annotations

For the zebrafinch, ground truth skeletons were used due to the size of the dataset. These skeletons were cropped, masked, and relabelled for the sub Rois that were tested in the paper. We evaluated voi, erl, and the mincut metric on the consolidated skeletons. The current implementation could be refactored / made more modular. It also uses node_collections which are now deprecated in daisy. To use with the current implementation, you should checkout daisy commit 39723ca.

Example evaluate annotations config
{
  "experiment": "zebrafinch",
  "setup": "setup01",
  "iteration": 400000,
  "config_slab": "mtlsd",
  "fragments_file": "foo.zarr",
  "fragments_dataset": "/volumes/fragments",
  "edges_db_host": "mongodb client",
  "edges_db_name": "foo",
  "edges_collection": "edges_hist_quant_75",
  "scores_db_name": "scores",
  "annotations_db_host": "mongo client",
  "annotations_db_name": "foo",
  "annotations_skeletons_collection_name": "zebrafinch",
  "node_components": "zebrafinch_components",
  "node_mask": "zebrafinch_mask",
  "roi_offset": [50800, 43200, 44100],
  "roi_shape": [10800, 10800, 10800],
  "thresholds_minmax": [0.5, 1],
  "thresholds_step": 1,
  "run_type": "11_micron_roi_masked"
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

lsds-0.1.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (284.4 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

lsds-0.1.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (286.1 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

lsds-0.1.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (285.5 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

lsds-0.1.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (249.5 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

lsds-0.1.1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (250.0 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

File details

Details for the file lsds-0.1.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lsds-0.1.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 582e3c91705d0a8270e88e8192f08a83e62691af8a26c5c0efaf633c10293721
MD5 1e51b886e646ab4a6e8aecceb7bd2c66
BLAKE2b-256 ae06cdb0fc11544c8e3ccd29386a728f6612cda41329d2f438ded96bf72baf0a

See more details on using hashes here.

File details

Details for the file lsds-0.1.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lsds-0.1.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 616388c19728b2af779caf23f4cd0b49739ca7344ee3ffcddb88c478a3922510
MD5 9dd6f39e05413bfb81b0f428d9e4429b
BLAKE2b-256 3c15aa85d4603a194f67a000ed4943afc57d787682c320940cc79478d14dfc86

See more details on using hashes here.

File details

Details for the file lsds-0.1.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lsds-0.1.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 40c7280748213a82c9a9301974d15b3c669978e4689dd76557b5bd9dfa548780
MD5 de38c9d95f62a8785b92ed5152c5e7d7
BLAKE2b-256 fbeae35625e14b7913190bf223c4df63055db3a946c4ce3da84212fd8c352cdc

See more details on using hashes here.

File details

Details for the file lsds-0.1.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lsds-0.1.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 166230b2f10bbd3be816d9c5c942c7ef26828fa5d438c046ca551e0c8f51f988
MD5 7dcb9881e282984781e3079b39b7afc0
BLAKE2b-256 8598383230ed65145d78fe9602b1e5902fac9d96029ddbaf5ee135e7e55fd885

See more details on using hashes here.

File details

Details for the file lsds-0.1.1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lsds-0.1.1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fe18508931991bcc030b0b58be5dc2ea00a48bebe421e241743fbbd2432787e7
MD5 8330943461d88a088436c6b0faba2460
BLAKE2b-256 39db34811c70e1e5fad91f21ff6499a8be8cec412eefef75dddce7f1cf9df4c0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page