Connected components on 3D images, supports multiple labels.
Project description
Connected Components 3D
Implementation of connected components in three dimensions using a 26, 18, or 6 connected neighborhood. This package uses a 3D variant of the two pass method by Rosenfeld and Pflatz augmented with UnionFind and a decision tree based on the 2D 8connected work of Wu, Otoo, and Suzuki. This implementation is compatible with images containing many different labels, not just binary images. It can be used with 2D or 3D images.
I wrote this package because I was working on densely labeled 3D biomedical images of brain tissue (e.g. 512x512x512 voxels). Other off the shelf implementations I reviewed were limited to binary images. This rendered these other packages too slow for my use case as it required masking each label and running the connected components algorithm once each time. For reference, there are often between hundreds to thousands of labels in a given volume. The benefit of this package is that it labels all connected components in one shot, improving performance by one or more orders of magnitude.
Check out benchmarks to see a comparison with SciPy on a few different tasks.
Python pip
Installaction
If compatible binaries are available for your platform, installation is particularly simple.
pip install connectedcomponents3d
If compatible binaries are not available, you can install from source as follows.
Requires a C++ compiler.
pip install numpy pip install connectedcomponents3d nobinary :all:
Occasionally, you may appear to successfully install cc3d, but on import you'll see an error that includes: numpy.ufunc size changed, may indicate binary incompatibility
. cc3d was compiled against numpy 1.16+ and unfortunately, there was a backwards incompatibilty between numpy 1.15 and 1.16. You can either try upgrading numpy or compiling from source in this case.
Python Manual Installation
Requires a C++ compiler.
pip install r requirements.txt python setup.py develop
Python Use
Important limitation: Only label values less than or equal to the size of the image in voxels (pixels) are supported currently. If you want to use larger values, consider using fastremap.renumber.
import cc3d import numpy as np labels_in = np.ones((512, 512, 512), dtype=np.int32) labels_out = cc3d.connected_components(labels_in) # 26connected connectivity = 6 # only 26, 18, and 6 are allowed labels_out = cc3d.connected_components(labels_in, connectivity=connectivity) # You can adjust the bit width of the output to accomodate # different expected image statistics with memory usage tradeoffs. # uint16, uint32 (default), and uint64 are supported. labels_out = cc3d.connected_components(labels_in, out_dtype=np.uint16) # You can extract individual components like so: N = np.max(labels_out) for segid in range(1, N+1): extracted_image = labels_out * (labels_out == segid) process(extracted_image) # We also include a region adjacency graph function # that returns a set of undirected edges. graph = cc3d.region_graph(labels_out, connectivity=connectivity)
If you know approximately how many labels you are going to generate, you can save some memory by specifying a number a safety factor above that range. The max label ID in your input labels must be less than max_labels
.
labels_out = connected_components(labels_in, max_labels=20000)
Note: C and Fortran order arrays will be processed in row major and column major order respectively, so the numbering of labels will be "transposed". The scare quotes are there because the dimensions of the array will not change.
C++ Use
#include "cc3d.hpp" // 3d array represented as 1d array int* labels = new int[512*512*512](); uint32_t* cc_labels = cc3d::connected_components3d<int>( labels, /*sx=*/512, /*sy=*/512, /*sz=*/512 ); // The default template parameter for output type is uint32_t uint64_t* cc_labels = cc3d::connected_components3d<int, uint64_t>( labels, /*sx=*/512, /*sy=*/512, /*sz=*/512 ); uint16_t* cc_labels = cc3d::connected_components3d<int, uint16_t>( labels, /*sx=*/512, /*sy=*/512, /*sz=*/512, /*connectivity=*/18 // default is 26 connected ); // edges is [ e11, e12, e21, e22, ... ] std::vector<uint64_t> edges = cc3d::extract_region_graph<uint64_t>( labels, /*sx=*/512, /*sy=*/512, /*sz=*/512, /*connectivity=*/18 // default is 26 connected );
Algorithm Description
The algorithm contained in this package is an elaboration into 3D images of the 2D image connected components algorithm described by Rosenfeld and Pflatz (RP) in 1968 [1] (which is well illustrated by this youtube video) using an equivalency list implemented as Tarjan's UnionFind disjoint set with path compression and balancing [2] and augmented with a decision tree based on work by Wu, Otoo, and Suzuki (WOS). [3] The description below describes the 26connected algorithm, but once you understand it, deriving 18 and 6 are simple.
First Principles in 2D
In RP's 4connected twopass method for binary 2D images, the algorithm raster scans and every time it first encounters a foreground pixel (the pixels to its top and left are background), it marks it with a new label. If there is a preexisting label in its neighborhood, it uses that label instead. Whenever two labels are adjacent, it records they are equivalent so that they can be relabeled consistently in the second pass. This equivalency table can be constructed in several ways, but some popular approaches are UnionFind with path compression with balancing by rank and Selkow's algorithm (which can avoid pipeline stalls). [4] However, Selkow's algorithm is designed for two trees of depth two, appropriate for binary images. We would like to process multiple labels at the same time, making UnionFind preferable.
In the second pass, the pixels are relabeled using the equivalency table. UnionFind establishes one label as the root label of a tree, and the root is considered the representative label. Each pixel is then labeled with the representative label. UnionFind is therefore appropriate for representing disjoint sets. Path compression with balancing radically reduces the height of the tree, which accelerates the second pass.
WOS approached the problem of accelerating 8connected 2D connected components on binary images. 8connected labeling is achieved by extending RP's forward pass mask to the top left and top right corner pixels. In UnionFind based connected components algorithms, the unify step in the first pass is the most expensive step. WOS showed how to optimize away a large fraction of these calls using a decision tree that takes advantage of local topology. For example, since the topcenter neighbor of the current pixel is also adjacent to the other mask elements, all of which have already been processed by virtue of the raster scan direction, if it is present it is sufficient to copy its value and move on. If it is absent, pick one of the remaining foreground pixels, copy their value, and use unify for the mask element on the right as it is now known to be nonneighboring with the left hand side. WOS's algorithm continues in this fashion until a match is found or all mask elements are processed at which point a new label is created.
For several years, this algorithm was the world's fastest, though it has been superceded by a newer work that exchanges the static decision tree for a dynamic one or precalculated generated one amongst other improvements. However, WOS's work is significant for both its simplicity and speed and thus serves as the inspiration for this library.
Extending to 3D
The approach presented below is very similar to that of Sutheebanjard [6]. To move to a 3D 26connected neighborhood, the mask must be extended into three dimensions in order to connect neighboring planes. Observe that the 8connected mask covers the trailing half of the neighborhood (the part that will have been already processed) such that the current pixel can rely on those labels. Thus the mask for the 26connected neighborhood covers only two out of three potential planes: the entire lower plane (nine voxels), and a mask identical to WOS's (four voxels) on the current plane. While some further optimizations are possible, to begin, the problem can be conceptually decomposed into two parts: establishing a 9connected link to the bottom plane and then an 8connected link to the current plane. This works because the current pixel functions as a hub that transmits the connection information from the 9connected step to the 8connected step.
Fig. 1: Mask for an 8connected plane. If J,K,L, and M are all eliminated, only N remains and a new label is assigned.
j  k  l 

m  n  . 
.  .  . 
The very first Z plane (Z=0) the algorithm runs against is special: the edge effect omits the bottom plane of the mask. Therefore, as the remaining mask is only comprosed of the 8connected 2D mask, after this pass, the bottom of the image is 8connected. At Z=1, the 9connected part of the mask kicks in, forming connections to Z=0, making the current plane now (8 + 9) 17connected. At Z=2, the 9connected bottom mask now forms connections from Z=1 to Z=2 on the top, making Z=1 (17 + 9) 26connected. By induction, when this process proceeds to completion it results in a 26connected labeling of the volume.
Following inspiration from WOS, we construct a decision tree on the densely labeled bottom plane that minimizes the number of unifications we need to perform.
Fig 2. The mask for the lower plane in 3D.
a  b  c 

d  e  f 
g  h  i 
As e
is connected to all other voxels, if present, it can simply be copied. If e
is absent, b
and h
fully cover the mask. If b
is absent, h
, a
, c
comprise a covering. If h
is absent, b
, g
, i
are one. Below is a list of coverings such that each proceeding entry in the list assumes the first letters in the entries above are background.
e
b
, (h
g
,i
)h
,a
,c
d
, (f
c
,i
)f
,g
,a
a
,c
,g
,i
c
,g
,i
g
,i
i
The decision tree is then constructed such that each of these coverings will be evaluated using the fewest unifications possible. It's possible to further optimize this by noting that e
and b
are both fully connected to the upper 2D mask. Therefore, if either of them are present, we can skip the 8connected unification step. It's also possible to try the DF covering first if B is background, which would save one unification versus HAC given even statistics, but it seems to be slightly slower on the dataset I attempted. To move from binary data to multilabel data, I simply replaced tests for foreground and background with tests for matching labels.
In order to make a reasonably fast implementation, I implemented unionfind with path compression. I conservatively used an IDs array qual to the size of the image for the unionfind data structure instead of a sparse map. The unionfind data structure plus the output labels means the memory consumption will be input + output + rank + equivalences. If your input labels are 32bit, the memory usage will be 4x the input size. This becomes more problematic when 64bit labels are used, but if you know something about your data, you can decrease the size of the unionfind data structure. I previously used unionbysize but for some reason it merely reduced performance and increased memory usage so it was removed.
For more information on the history of connected components algorithms, and an even faster approach for 2D 8connected components, consult Grana et al's paper on Block Based Decision Trees. [5]
References
 A. Rosenfeld and J. Pfaltz. "Sequential Operations in Digital Picture Processing". Journal of the ACM. Vol. 13, Issue 4, Oct. 1966, Pg. 471494. doi: 10.1145/321356.321357 (link)
 R. E. Tarjan. "Efficiency of a good but not linear set union algorithm". Journal of the ACM, 22:215225, 1975. (link)
 K. Wu, E. Otoo, K. Suzuki. "Two Strategies to Speed up Connected Component Labeling Algorithms". Lawrence Berkely National Laboratory. LBNL29102, 2005. (link)
 S. Selkow. "The TreetoTree Editing Problem". Information Processing Letters. Vol. 6, No. 6. June 1977. doi: 10.1016/00200190(77)900643 (link)
 C. Grana, D. Borghesani, R. Cucchiara. "Optimized Blockbased Connected Components Labeling with Decision Trees". IEEE Transactions on Image Porcessing. Vol. 19, Iss. 6. June 2010. doi: 10.1109/TIP.2010.2044963 (link)
 P. Sutheebanjard. "Decision Tree for 3D Connected Components Labeling". Proc. 2012 International Symposium on Information Technology in Medicine and EEducation. doi: 10.1109/ITiME.2012.6291402 (link)
Project details
Release history Release notifications  RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size  File type  Python version  Upload date  Hashes 

Filename, size connected_components_3d1.7.0cp27cp27mmacosx_10_15_x86_64.whl (199.8 kB)  File type Wheel  Python version cp27  Upload date  Hashes View 
Filename, size connected_components_3d1.7.0cp27cp27mmanylinux1_x86_64.whl (906.3 kB)  File type Wheel  Python version cp27  Upload date  Hashes View 
Filename, size connected_components_3d1.7.0cp34cp34mmanylinux1_x86_64.whl (884.0 kB)  File type Wheel  Python version cp34  Upload date  Hashes View 
Filename, size connected_components_3d1.7.0cp35cp35mmacosx_10_6_intel.whl (364.0 kB)  File type Wheel  Python version cp35  Upload date  Hashes View 
Filename, size connected_components_3d1.7.0cp35cp35mmanylinux1_x86_64.whl (897.7 kB)  File type Wheel  Python version cp35  Upload date  Hashes View 
Filename, size connected_components_3d1.7.0cp36cp36mmacosx_10_9_x86_64.whl (198.0 kB)  File type Wheel  Python version cp36  Upload date  Hashes View 
Filename, size connected_components_3d1.7.0cp36cp36mmanylinux1_x86_64.whl (908.9 kB)  File type Wheel  Python version cp36  Upload date  Hashes View 
Filename, size connected_components_3d1.7.0cp36cp36mwin32.whl (145.8 kB)  File type Wheel  Python version cp36  Upload date  Hashes View 
Filename, size connected_components_3d1.7.0cp36cp36mwin_amd64.whl (161.2 kB)  File type Wheel  Python version cp36  Upload date  Hashes View 
Filename, size connected_components_3d1.7.0cp37cp37mmacosx_10_9_x86_64.whl (19.9 kB)  File type Wheel  Python version cp37  Upload date  Hashes View 
Filename, size connected_components_3d1.7.0cp37cp37mmanylinux1_x86_64.whl (908.5 kB)  File type Wheel  Python version cp37  Upload date  Hashes View 
Filename, size connected_components_3d1.7.0cp37cp37mwin32.whl (145.8 kB)  File type Wheel  Python version cp37  Upload date  Hashes View 
Filename, size connected_components_3d1.7.0cp37cp37mwin_amd64.whl (161.2 kB)  File type Wheel  Python version cp37  Upload date  Hashes View 
Filename, size connected_components_3d1.7.0cp38cp38macosx_10_9_x86_64.whl (377.9 kB)  File type Wheel  Python version cp38  Upload date  Hashes View 
Filename, size connected_components_3d1.7.0cp38cp38manylinux1_x86_64.whl (947.1 kB)  File type Wheel  Python version cp38  Upload date  Hashes View 
Filename, size connected_components_3d1.7.0cp38cp38win32.whl (148.5 kB)  File type Wheel  Python version cp38  Upload date  Hashes View 
Filename, size connected_components_3d1.7.0cp38cp38win_amd64.whl (164.3 kB)  File type Wheel  Python version cp38  Upload date  Hashes View 
Filename, size connectedcomponents3d1.7.0.tar.gz (355.0 kB)  File type Source  Python version None  Upload date  Hashes View 
Hashes for connected_components_3d1.7.0cp27cp27mmacosx_10_15_x86_64.whl
Algorithm  Hash digest  

SHA256  56c225257afe709c2085095b6a2e2a2af1444a1eccdf0ecfb4cb1f09451cf79b 

MD5  9a4222ad6094bde53d030ad8bcc0ef85 

BLAKE2256  32581002b262a8453c83d2b1deb4874c510d35718c01e64ecd248943a3743160 
Hashes for connected_components_3d1.7.0cp27cp27mmanylinux1_x86_64.whl
Algorithm  Hash digest  

SHA256  a2990b1c505465fb0e4c4c28d4107c8bacc4282e0d1232c085ac8e5b02ff4d54 

MD5  a6de6eac2641989651a2bcaeeb55fd6d 

BLAKE2256  d2f1f3e2fd42cf236e99727c15bdeb14b9b33752afa1dcee56827817ddc247e1 
Hashes for connected_components_3d1.7.0cp34cp34mmanylinux1_x86_64.whl
Algorithm  Hash digest  

SHA256  d93934b4aa799af5de3fafe02af629a549a953e061858698a6567563f937b512 

MD5  b79addef6166268cf049a6c2ab770d0f 

BLAKE2256  4108c3b2ec5558ccced54d67afb676076a6141e1d7713d36abcab55a468d9d40 
Hashes for connected_components_3d1.7.0cp35cp35mmacosx_10_6_intel.whl
Algorithm  Hash digest  

SHA256  38aab312eaa42e6f73e275012ce8653a3b9237e39a6a7ae9346b6e603f35c1ea 

MD5  9b50eb84a8380857a66352e8e248311f 

BLAKE2256  e5c649308d44fd0b64fec1b1bee00389cd6f70977bf44b160581357bd8d6a43a 
Hashes for connected_components_3d1.7.0cp35cp35mmanylinux1_x86_64.whl
Algorithm  Hash digest  

SHA256  8645df5f0daa5f3b5dffad2a1b56ff798cd209e7d3812a0f634e17c6a4820eab 

MD5  9981a3429eb155f254ecad00831952e0 

BLAKE2256  6162eb33ea0fe6bd74bb307776cde789eccb31d320663669cca722f230ac6f31 
Hashes for connected_components_3d1.7.0cp36cp36mmacosx_10_9_x86_64.whl
Algorithm  Hash digest  

SHA256  92644e16c06552ad86cd356bb406e6ee7e060b0e93d51bb19e239240362230b5 

MD5  5eeec6e60401ca137b96e5e01861d704 

BLAKE2256  72b0d74d7999b33872e21674038574d3d1d4a96d57f2554bb7962d1b2ae71b8a 
Hashes for connected_components_3d1.7.0cp36cp36mmanylinux1_x86_64.whl
Algorithm  Hash digest  

SHA256  a1ff260843485e96c06772feaf641dc5c1ca923f9b5a46ef6f2274c6a3a0f114 

MD5  7e7e92d01905201a1a093140dc1c2c77 

BLAKE2256  a1bc683a784f6b9291a5b49e39643080d8d26733a02e1b3472d49460f52fc499 
Hashes for connected_components_3d1.7.0cp36cp36mwin32.whl
Algorithm  Hash digest  

SHA256  7469a6f4d878cf18ecec268e066bf5ab09efd1aad1b3dbc591218b32accfd9b4 

MD5  20a80c6195467ed7810753e609dbbd08 

BLAKE2256  165227461fa79044edda99b499b52bee8487e2d510bcd7becfcbb95eca117511 
Hashes for connected_components_3d1.7.0cp36cp36mwin_amd64.whl
Algorithm  Hash digest  

SHA256  1ad9982dc1122c1aa9c58bc812b3f20797f54965107e3559c68e9733782a07f6 

MD5  631e70b4c348d51ab46dac31abfbf82f 

BLAKE2256  ca6041ace744f31512a32b4f941705655d4f758a96fff67b151b6dd759a48700 
Hashes for connected_components_3d1.7.0cp37cp37mmacosx_10_9_x86_64.whl
Algorithm  Hash digest  

SHA256  eff8f99a7ae0ebe280ab6d5d2e641af3f062094a777905927081a1bab2082b8b 

MD5  46a32ef30753d329cc26bbe417710ac8 

BLAKE2256  27f95a28eb02509e0d88944dcf5990db6d93323a13d578a54461d152679ac37b 
Hashes for connected_components_3d1.7.0cp37cp37mmanylinux1_x86_64.whl
Algorithm  Hash digest  

SHA256  ee8fb47ba058b71bd457bcf79f493e45af8d57d169eb93b5549badb9f66604a2 

MD5  c858acf844cccc5c96d782884545c076 

BLAKE2256  90f4f1ed6dfccfa7416691837c05b25acc7d902f56e65c22068ebce6d4d24e25 
Hashes for connected_components_3d1.7.0cp37cp37mwin32.whl
Algorithm  Hash digest  

SHA256  574146ef32fcec28d569fac4a0b6eb8978ffd0eefa9c78e7e770ac0088f8c6f3 

MD5  818f2b0cde5f1ede0145ca5ba1ceeb12 

BLAKE2256  9c775fad0ab610ce98298a77aa3f3041b1ea407f0efc1cad23d3272001f8e891 
Hashes for connected_components_3d1.7.0cp37cp37mwin_amd64.whl
Algorithm  Hash digest  

SHA256  40de31320f31c376d3fe27cfddfd9dcc9523557a3b76a73df1072e6e5dffaae1 

MD5  75c01592f62253e59908e3d617c48466 

BLAKE2256  fc3e9724726ea5f9ef1be779142396be3b6b1dc2436f8fcd3c14403dace426ef 
Hashes for connected_components_3d1.7.0cp38cp38macosx_10_9_x86_64.whl
Algorithm  Hash digest  

SHA256  3f50f585bb02d63105fc7a7521b696b812e8032ec7f19d6f9c1b2d31d36dd5c7 

MD5  439e85bfdc764492a362397b0a9853e8 

BLAKE2256  0ef41d4c2df34a2a2066d1e344b61ba92288e782a830e4d685ee156823a49d58 
Hashes for connected_components_3d1.7.0cp38cp38manylinux1_x86_64.whl
Algorithm  Hash digest  

SHA256  0d345cb15f8e1d9dd2b467fcced3f371ead9bda01003ddccfcac791cb1a47298 

MD5  e6cd44900eaa80469ddb78155b16b2c4 

BLAKE2256  5c8d60d6ac3d53f09f555e62fd8c7cab969c32e568de9ecc9615a185ba83eab1 
Hashes for connected_components_3d1.7.0cp38cp38win32.whl
Algorithm  Hash digest  

SHA256  207eb74055e104be23fe79cdff075ef9e6b7883886cfa8375ec97ef83ebf4cf9 

MD5  76cca6b0aa5b2f34ad575905a7753cc6 

BLAKE2256  ddc769f38b23560bc3dd4819a6464c998e8a321cd4f03dae18e18db0374f2701 
Hashes for connected_components_3d1.7.0cp38cp38win_amd64.whl
Algorithm  Hash digest  

SHA256  4781537c038bb96426b9f564d9b1384c687d0ebc089470e313009b27417ee4f2 

MD5  96c3f26ec2aede67c4d98c17c3ab23bb 

BLAKE2256  ba336a5bd358818481e6063d1c9a2c01f31ce771c875feee618df259fd4e77ed 
Hashes for connectedcomponents3d1.7.0.tar.gz
Algorithm  Hash digest  

SHA256  34873fd4ca7c9fdf42254f063d9dd19f9fa4c8027a047fd8a088f41a18d67e96 

MD5  f13ec995fa6ba8a517df628c53999ef2 

BLAKE2256  39168811112eea7cce6f900675c0c77258778021b2646f018ac6b1bfafcc4568 