Skip to main content

Local Shape Descriptors.

Project description

Local Shape Descriptors (for Neuron Segmentation)

This repository contains code to compute Local Shape Descriptors (LSDs) from an instance segmentation. LSDs can then be used during training as an auxiliary target, which we found to improve boundary prediction and therefore segmentation quality. Read more about it in our paper and/or blog post.

Paper Blog Post
Paper Blog post

Quick 2d Examples

Notebooks

Example networks & pipelines

Parallel processing


Cite:

@article{sheridan_local_2021,
	title = {Local Shape Descriptors for Neuron Segmentation},
	url = {https://www.biorxiv.org/content/10.1101/2021.01.18.427039v1},
	urldate = {2021-01-20},
	journal = {bioRxiv},
	author = {Sheridan, Arlo and Nguyen, Tri and Deb, Diptodip and Lee, Wei-Chung Allen and Saalfeld, Stephan and Turaga, Srinivas and Manor, Uri and Funke, Jan},
	year = {2021}
}

Notes:

  • Tested on Ubuntu 18.04 with Python 3.

  • This is not production level software and was developed in a pure research environment. Therefore some scripts may not work out of the box. For example, all paper networks were originally written using now deprecated tensorflow/cudnn versions and rely on an outdated singularity container. Because of this, the singularity image will not build from the current recipe - if replicating with the current implementations, please reach out for the singularity container (it is too large to upload here). Alternatively, consider reimplementing networks in pytorch (recommended - see Training).

  • Post-proccesing steps were designed for use with a specific cluster and will need to be tweaked for individual use cases. If the need / use increases then we will look into refactoring, packaging and distributing.

  • Currently, several post-processing scripts (e.g watershed) are located inside this repo which creates more dependencies than needed for using the lsds. One forseeable issue is that agglomeration requires networkx==2.2 for the MergeTree and boost is required for funlib.segment. We have restructured the repo to use lsd.train and lsd.post submodules. For just calculating the lsds, it is sufficient to use lsd.train, e.g:

from lsd.train import local_shape_descriptor

Quick 2d Examples

The following tutorial allows you to run in the browser using google colab. In order to replicate the tutorial locally, create a conda environment and install the relevant packages. E.g:

  1. conda create -n lsd_test python=3
  2. conda activate lsd_test
  3. pip install matplotlib scikit-image gunpowder
  4. pip install git+https://github.com/funkelab/lsd.git

tutorial: Open In Colab


Notebooks

  • Examble colab notebooks are located here. You can download or run below (control + click open in colab). When running a notebook, you will probably get the message: "Warning: This notebook was not authored by Google". This can be ignored, you can run anyway.

  • We uploaded ~1.7 tb of data (raw/labels/masks/rags etc.) to an s3 bucket. The following tutorial shows some examples for accessing and visualizing the data.

    • Data download: Open In Colab
  • If implementing the LSDs in your own training pipeline (i.e pure pytorch/tensorflow), calculate the LSDs on a label array of unique objects and use them as the target for your network (see quick 2d examples above for calculating).

  • The following tutorials show how to set up 2D training/prediction pipelines using Gunpowder. It is recommended to follow them in order (skip the basic tutorial if familiar with gunpowder). Note: Google Colab can sometimes be slow especially due to data I/O. These notebooks will run much faster in a jupyter notebook on a local gpu, but the Colab versions should provide a starting point.

    • Basic Gunpowder tutorial: Open In Colab

    • Train Affinities: Open In Colab

    • Train LSDs: Open In Colab

    • Train MTLSD: Open In Colab

    • Inference (using pretrained MTLSD checkpoint): Open In Colab

    • Watershed, agglomeration, segmentation: Open In Colab

  • Bonus notebooks:

    • Training using sparse ground truth (useful if you only have a subset of training data but still want dense predictions): Open In Colab

    • Ignore regions during training (useful if you want the network to learn to predict zeros in certain regions, eg glia ids): Open In Colab

    • Train lsds on non-em data with pytorch: Open In Colab


Example networks & pipelines

  • There are some example networks and training/prediction pipelines from the fib25 dataset here.

Training

  • Since networks in this paper were implemented in Tensorflow, there was a two step process for training. First the networks were created using the mknet.py files. This saved tensor placeholders and meta data in config files that were then used for both training and prediction. The mknet files used the now deprecated mala repository to create the networks. If reimplementing in Tensorflow, consider migrating to funlib.learn.tensorflow.

  • If using Pytorch, the networks can just be created directly inside the train scripts since placeholders aren't required. For example, the logic from this tensorflow mknet script and this tensorflow train script can be condensed to this pytorch train script.

  • For training an autocontext network (e.g acrlsd), the current implementation learns the LSDs in a first pass. A saved checkpoint is then used when creating the second pass in order to predict LSDs prior to learning the Affinities. One could modify this to use a single setup and remove the need for writing the LSDs to disk.

Inference

  • By default, the predict scripts (example) contain the worker logic to be distributed by the scheduler during parallel processing (see below).

  • If you just need to process a relatively small volume, it is sometimes not necessary to use blockwise processing. In this case, it is recommended to use a scan node, and specify input/output shapes + context. An example can be found in the inference colab notebook above.

  • Similar to training, the current autocontext implementations assume the predicted LSDs are written to a zarr/n5 container and then used as input to the second pass to predict affinities. This can also be changed to predict on the fly if needed.

Visualizations of example training/prediction pipelines

Vanilla affinities training:



Autocontext LSD and affinities prediction:


Parallel processing

  • If you are running on small data then this section may be irrelevant. See the Watershed, agglomeration, segmentation notebook above if you just want to get a sense of obtaining a segmentation from affinities.

  • Example processing scripts can be found here

  • We create segmentations following the approach in this paper. Generally speaking, after training a network there are five steps to obtain a segmentation:

  1. Predict boundaries (this can involve the use of LSDs as an auxiliary task)
  2. Generate supervoxels (fragments) using seeded watershed. The fragment centers of mass are stored as region adjacency graph nodes.
  3. Generate edges between nodes using hierarchical agglomeration. The edges are weighted by the underlying affinities. Edges with lower scores are merged earlier.
  4. Cut the graph at a predefined threshold and relabel connected components. Store the node - component lookup tables.
  5. Use the lookup tables to relabel supervoxels and generate a segmentation.

  • Everything was done in parallel using daisy (github, docs), but one could use multiprocessing or dask instead.

  • For our experiments we used MongoDB for all storage (block checks, rags, scores, etc) due to the size of the data. Depending on use case, it might be better to read/write to file rather than mongo. See watershed for further info.

  • The following examples were written for use with the Janelia LSF cluster and are just meant to be used as a guide. Users will likely need to customize for their own specs (for example if using a SLURM cluster).

  • Need to install funlib.segment and funlib.evaluate if using/adapting segmentation/evaluation scripts.

Inference

The worker logic is located in individual predict.py scripts (example). The master script distributes using daisy.run_blockwise. The only need for MongoDb here is for the block check function (to check which blocks have successfully completed). To remove the need for mongo, one could remove the check function (remember to also remove block_done_callback in predict.py) or replace with custom function (e.g check chunk completion directly in output container).

Example roi config
{
  "container": "hemi_roi_1.zarr",
  "offset": [140800, 205120, 198400],
  "size": [3000, 3000, 3000]
}
Example predict config
 {
  "base_dir": "/path/to/base/directory",
  "experiment": "hemi",
  "setup": "setup01",
  "iteration": 400000,
  "raw_file": "predict_roi.json",
  "raw_dataset" : "volumes/raw",
  "out_base" : "output",
  "file_name": "foo.zarr",
  "num_workers": 5,
  "db_host": "mongodb client",
  "db_name": "foo",
  "queue": "gpu_rtx",
  "singularity_image": "/path/to/singularity/image"
}

Watershed

The worker logic is located in a single script which is then distributed by the master script. By default the nodes are stored in mongo using a MongoDbGraphProvider. To write to file (i.e compressed numpy arrays), you can use the FileGraphProvider instead (inside the worker script).

Example watershed config
{
  "experiment": "hemi",
  "setup": "setup01",
  "iteration": 400000,
  "affs_file": "foo.zarr",
  "affs_dataset": "/volumes/affs",
  "fragments_file": "foo.zarr",
  "fragments_dataset": "/volumes/fragments",
  "block_size": [1000, 1000, 1000],
  "context": [248, 248, 248],
  "db_host": "mongodb client",
  "db_name": "foo",
  "num_workers": 6,
  "fragments_in_xy": false,
  "epsilon_agglomerate": 0,
  "queue": "local"
}

Agglomerate

Same as watershed. Worker script, master script. Change to FileGraphProvider if needed.

Example agglomerate config
{
  "experiment": "hemi",
  "setup": "setup01",
  "iteration": 400000,
  "affs_file": "foo.zarr",
  "affs_dataset": "/volumes/affs",
  "fragments_file": "foo.zarr",
  "fragments_dataset": "/volumes/fragments",
  "block_size": [1000, 1000, 1000],
  "context": [248, 248, 248],
  "db_host": "mongodb client",
  "db_name": "foo",
  "num_workers": 4,
  "queue": "local",
  "merge_function": "hist_quant_75"
}

Find segments

In contrast to the above three methods, when creating LUTs there just needs to be enough RAM to hold the RAG in memory. The only thing done in parallel is reading the graph (graph_provider.read_blockwise()). It could be adapted to use multiprocessing/dask for distributing the connected components for each threshold, but if the rag is too large there will be pickling errors when passing the nodes/edges. Daisy doesn't need to be used for scheduling here since nothing is written to containers.

Example find segments config
{
  "db_host": "mongodb client",
  "db_name": "foo",
  "fragments_file": "foo.zarr",
  "edges_collection": "edges_hist_quant_75",
  "thresholds_minmax": [0, 1],
  "thresholds_step": 0.02,
  "block_size": [1000, 1000, 1000],
  "num_workers": 5,
  "fragments_dataset": "/volumes/fragments",
  "run_type": "test"
}

Extract segmentation

This script does use daisy to write the segmentation to file, but doesn't necessarily require bsub/sbatch to distribute (you can run locally).

Example extract segmentation config
{
  "fragments_file": "foo.zarr",
  "fragments_dataset": "/volumes/fragments",
  "edges_collection": "edges_hist_quant_75",
  "threshold": 0.4,
  "block_size": [1000, 1000, 1000],
  "out_file": "foo.zarr",
  "out_dataset": "volumes/segmentation_40",
  "num_workers": 3,
  "run_type": "test"
}

Evaluate volumes

Evaluate Voi scores. Assumes dense voxel ground truth (not skeletons). This also assumes the ground truth (and segmentation) can fit into memory, which was fine for hemi and fib25 volumes assuming ~750 GB of RAM. The script should probably be refactored to run blockwise.

Example evaluate volumes config
{
  "experiment": "hemi",
  "setup": "setup01",
  "iteration": 400000,
  "gt_file": "hemi_roi_1.zarr",
  "gt_dataset": "volumes/labels/neuron_ids",
  "fragments_file": "foo.zarr",
  "fragments_dataset": "/volumes/fragments",
  "db_host": "mongodb client",
  "rag_db_name": "foo",
  "edges_collection": "edges_hist_quant_75",
  "scores_db_name": "scores",
  "thresholds_minmax": [0, 1],
  "thresholds_step": 0.02,
  "num_workers": 4,
  "method": "vanilla",
  "run_type": "test"
}

Evaluate annotations

For the zebrafinch, ground truth skeletons were used due to the size of the dataset. These skeletons were cropped, masked, and relabelled for the sub Rois that were tested in the paper. We evaluated voi, erl, and the mincut metric on the consolidated skeletons. The current implementation could be refactored / made more modular. It also uses node_collections which are now deprecated in daisy. To use with the current implementation, you should checkout daisy commit 39723ca.

Example evaluate annotations config
{
  "experiment": "zebrafinch",
  "setup": "setup01",
  "iteration": 400000,
  "config_slab": "mtlsd",
  "fragments_file": "foo.zarr",
  "fragments_dataset": "/volumes/fragments",
  "edges_db_host": "mongodb client",
  "edges_db_name": "foo",
  "edges_collection": "edges_hist_quant_75",
  "scores_db_name": "scores",
  "annotations_db_host": "mongo client",
  "annotations_db_name": "foo",
  "annotations_skeletons_collection_name": "zebrafinch",
  "node_components": "zebrafinch_components",
  "node_mask": "zebrafinch_mask",
  "roi_offset": [50800, 43200, 44100],
  "roi_shape": [10800, 10800, 10800],
  "thresholds_minmax": [0.5, 1],
  "thresholds_step": 1,
  "run_type": "11_micron_roi_masked"
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lsds-0.1.3.tar.gz (119.3 kB view details)

Uploaded Source

Built Distributions

lsds-0.1.3-cp311-cp311-macosx_11_0_arm64.whl (82.0 kB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

lsds-0.1.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (285.5 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

lsds-0.1.3-cp310-cp310-macosx_11_0_arm64.whl (81.8 kB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

lsds-0.1.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (287.2 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

lsds-0.1.3-cp39-cp39-macosx_11_0_arm64.whl (82.3 kB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

lsds-0.1.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (286.6 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

lsds-0.1.3-cp38-cp38-macosx_11_0_arm64.whl (84.6 kB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

lsds-0.1.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (250.6 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

lsds-0.1.3-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (251.1 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

File details

Details for the file lsds-0.1.3.tar.gz.

File metadata

  • Download URL: lsds-0.1.3.tar.gz
  • Upload date:
  • Size: 119.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.7

File hashes

Hashes for lsds-0.1.3.tar.gz
Algorithm Hash digest
SHA256 492afa7bc616d2e2db528aa7af86b6d7a08281aa35d6301a820e4c68dbc303ec
MD5 1cc5efbb4fc91b4097c040f85470dc3c
BLAKE2b-256 26fb3aa48ed2fd325293a298653cd19f02c8002518e31af47159a3bdee29e976

See more details on using hashes here.

File details

Details for the file lsds-0.1.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for lsds-0.1.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 011785393879296cd69dffee5a6df29358e5d3b42c0f7d8a113dfe3d6fff1de2
MD5 003265ee71c93d31b30f851d351b8f40
BLAKE2b-256 8214eb7fb7e14177873648ce344cc31d5c157342ce89e77fdb423b7d86141b0d

See more details on using hashes here.

File details

Details for the file lsds-0.1.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lsds-0.1.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 99a4b162dd2b9c140035f18c3394555db0fe47577007ad88063db4ee01ef686a
MD5 c3ec712e7337fbf172ad6576d2d95180
BLAKE2b-256 db1824e9988163aa84c2b5df83956c6d265f7de0061bb8e4a341cab09cf58bcf

See more details on using hashes here.

File details

Details for the file lsds-0.1.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for lsds-0.1.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a1066ff65c84e3b80c71125849131ecedbe1b613d9721ebf3f009cc3007e4bea
MD5 0ae39ea18d5b0a3184086a9b9ad62846
BLAKE2b-256 7c6288ecddac304e896d7946d1e4b86dc6a2ddb913338b81805e0b9acaa88489

See more details on using hashes here.

File details

Details for the file lsds-0.1.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lsds-0.1.3-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8bfc6b846f2e3e473ca6bf6dcbb38b59d55ae21449ebe3ac9a3b2428cbd2fbb8
MD5 3ba2ec85a284cba7eeb028c43fc6c4e3
BLAKE2b-256 d846c7dfe0ab2e1627696c880bff97c14fc0c5fed0927e362838579c1f0345e5

See more details on using hashes here.

File details

Details for the file lsds-0.1.3-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for lsds-0.1.3-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 93bae613b4330ae7335e1775b2a56ff6dacd8f0279c7a6a8b4a63d21b7685738
MD5 504c4372f68edf574bb274e0cac83f94
BLAKE2b-256 fe4772b43a965bb1f8d4e6bb3e3d28f24cae34b6728fdf6b0dc1768538e73878

See more details on using hashes here.

File details

Details for the file lsds-0.1.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lsds-0.1.3-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3c0871fbf57245b5bd570bba41cd3c12e645572229b613fab4bae97bdd07db9e
MD5 2eeb9d8eb20a2973e16bfe1f63b3c334
BLAKE2b-256 3bfaae39685f0fedd065f86d44609f9d09e08c1f068ca4a4053b7022048ec640

See more details on using hashes here.

File details

Details for the file lsds-0.1.3-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for lsds-0.1.3-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d0e46dc778dbefef966bef189d041365aa04a031cfa98c77aa648363b5a83cd8
MD5 8057ce05bc80420ae9f93caca2e3a4bf
BLAKE2b-256 6cb8681150afaa73ff9a83a6e1a0cf6e2bd8cddafe2b2b92539392f7d8a5a55b

See more details on using hashes here.

File details

Details for the file lsds-0.1.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lsds-0.1.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ebedfa92d38a80fc9269e9bcd8717057a235b9ca51a3fbe62c5d50fe523dd359
MD5 3a0c5b5359d12677b9de8ce4fb8e62c1
BLAKE2b-256 19bc4ee598c178ab1d033730c0e8769c67b9f23938a5362ad83520bcb152bd03

See more details on using hashes here.

File details

Details for the file lsds-0.1.3-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lsds-0.1.3-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e8ce913027c6883c1fc378f60048cf75735cd7b6a9d4aeb0fd41fa8f0dd5f705
MD5 00ec552be54996d833cc3d48f574bb0c
BLAKE2b-256 835cb61af3dc1c3bafbed0c9b0bd7db2e309dc6d8a1c7476830e88f0e3f429b1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page