Skip to main content

A Python library to read, transform, manipulate images and manage databases

Project description

CERVO-dcclab Python module

This simple module is meant to simplify the loading and treatment of images at CERVO (or in any lab) but also to manage databases for data. The ultimate goal of this module is to rapidly be able to extract useful and pertinent information about microscopy images.

Image Analysis

This module is a task-oriented module for image analysis: it provides simple tools (classes) to easily read image files, inspect them and manipulate them. For instance, the following classes:

  1. Image: can read most image formats, including Zeiss microscope files (.czi).
  2. Channel: each image has one or several channels. The channels, which correspond to specific fluorophores, can be manipulated with filters, threshold, segmentation and other operations. More complex methods like watershed are also available to use.
  3. ImageCollection: can read a collection of image files (e.g., a directory, a z-stack, a map, etc...)

Installation

To install development versions, use:

python setup.py install -f

Required modules should be installed automatically. If anything is missing, let us know.

You should then be able to simply import the module in your own scripts:

import dcclab

# ... you script
img = Image('yourFile.tiff')
img.display()

Required modules for Image

If installed through python setup.py install -f, all modules should install automatically. If needed, you may install the following modules:

pip install PIL
pip install Scikit-image
pip install Numpy
pip install Scipy
pip install Matplotlib
pip install Tifffile
pip install czifile
pip install opencv-python

Documentation

There are many image libraries, and this is one more to work with. Every module has a purpose, and the present module aims to be easy to use. This means clarity of the module for users is key, and to a certain extent, clarity for developers is also important: the module was developed by trainees in the laboratory of Daniel Côté at the CERVO Brain Research Center in Québec city, and new trainees every year contribute to the module. Therefore, it is the primary design consideration for the module to be readable, not to be high-performance. That said, the library offers a very impressive performance nevertheless. Below you will find definitions and conventions for the code, classes and files.

The module makes heavy use of the numpy module, because every package manipulating images goes back to this high-performance module for storing arrays. We are no different, and we use it like everyone else.

Definitions

  1. Channel: a channel is what most people would consider a grayscale image. It represents a collection of pixels in 2D for a single contrast mechanism (for instance, GFP, DAPI, Raman, wide field, etc…). When need as an array, it is a numpy.ndarray that has 2 dimensions: width ⨉ height. It is never a 3D array because it never has more than one contrast agent. The Channel contains also any information that was recovered from reading a file (if it came from a file: objective, magnification, scale, etc…) if that information was available.

  2. Image: an image is what anyone would consider "an image": it represents a collection channels. For instance, an image from a microscope may return three colours in red, green and blue (commonly stored as an RGB image such as TIFF for instance). When needed, it is a numpy.ndarray that has 3 dimensions: width ⨉ height ⨉ channels. Note that the is always a channel dimension, even with only one channel.

  3. ImageCollection: as the name implies, it is a collection of Images. We often deal with collection of images, and we often want to operate on a group of images (e.g., segment many images, strip a channel from images, obtain the average of a given colour, etc…). The images may or may not be stored as separate files. They may be stored as separate frames in a movie. The may be stored in some proprietary format. Regardless, to the scientist, it is a collection of images that may or may not have more than one Channel. The images may or may not have the same format, therefore it is not always possible to obtain a numpy.ndarray representing the entire collection. When all images are the same format (width, height and number of channels), then one can obtain a 4D numpy.ndarray with width ⨉ height ⨉ channels ⨉ collection. Note that the ImageCollection always has a channel dimension, even with only one channel, and there is always a collection dimension, even with only one image in the collection.

  4. ZStack: of all ImageCollections, the z-satck is the most common. It represents a one-dimensional series of images that were all taken at different z depths. It is of course a special type of ImageCollection because all images are the same contrast, essentially at the same position (x,y) but different z, and all have the same "properties" (size, laser, etc…), therefore it is always possible to obtain a 4D numpy.ndarrayrepresenting the z-stack with width ⨉ height ⨉ channels ⨉ collection. Note that the stack always has a channel dimension, even with only one channel, and there is always a collection dimension, even with only one image in the collection.

Operations

With images, we want to filter noise, segment images, find cells, threshold them, mask them, blur them, etc… All these operations are defined in the module with a language that is "task-oriented": the function for removing noise is called filterNoise, the function to threshold is called threshold, etc… That way, code will read like a sentence: image.filterNoise() or image.threshold(), or even image.filterNoise().maskWithThreshold().labelMaskComponents(). FIXME: functions currently do not return self

Operations to manipulate "images" as we say, really are operations that operate on Channels, not Images. Indeed, when a scientist wants to segment an "image", he or she really wants to segment either a single channel, or all channels separately. Hence most (but not all) operations are defined at the level of Channel where all the work takes place. Before going any further, we can already hear people taking offense: "But I want to segment my images! I don't care about this abstract separation of channels and images defined in your module. And examples above use images, not channels!". And you are right. This is why Image and ImageCollection also define most operations, but Image will loop through its Channels to operate, and ImageCollection will loop through its Images, which will loop through their Channels to actually get the work done. 99% of the time, this is what people expect. If one had the following script:

import dcclab

coll = ImageCollection(filePath='somefiles-\d+.tiff') #details on loading patterns later
coll.filterNoise()
coll.applyGaussianFilter()

one would expect that the noise be removed from all images in the collection, in each channel. If one knows that the collection has several unnecessary channels (say we know GFP is in the Green channel (2) and Red (1) and Blue (3) are not used), then we can remove them from the images before filtering out the noise:

import dcclab

coll = ImageCollection(filePath='somefiles-\d+.tiff') #details on loading patterns later
coll.removeChannels(1,3)
coll.filterNoise()
coll.applyGaussianFilter()
coll.align()

Database

A Database class allows one to manage files that may be spread over different fileservers. A local example at CERVO — the Plateforme d'Outils Moléculaires — is supported, but the DCCLab, PDK, and Martin Levesque groups will be supported in the near future.

For each specific database, a new class inheriting from the Database object can be queried through a general SQL API but also specific-task-oriented API. MySQL and sqlite3 are supported. MySQL over ssh is also supported. For example, the database allow requests such as:

  1. all images using the viral vector AAV-173,
  2. all images of microglia,
  3. all images of neurons from the subthalamic nucleus.

The database is ready to use (i.e. connected) upon creation. To begin using the Database, making queries or inserting into it, use the exposed API (e.g., select(table, columns, condition) -> Row:) or execute an explicit SQL command (e.g., execute(statement)). To create a new database, a Database object has to be created with writePermission=True (sqlite3) or created on the MySQL server directly. If it does not exist yet, the database will be created at the Database.path location (in URI).

Disclaimer

Copyrights DCCLab Members (2019-).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcclab-1.0.7.tar.gz (120.0 kB view hashes)

Uploaded Source

Built Distribution

dcclab-1.0.7-py3-none-any.whl (149.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page