Skip to main content

A framework of utilities to help at the use of the CCAgT dataset

Project description

PyPI Code coverage Status pre-commit.ci status main status DOI

CCAgT-utils

CCAgT-utils it's a package to work with the CCAgT dataset: Images of Cervical Cells with AgNOR Stain Technique. The package will provide some customized codes for annotations format conversion, mask generation, plotting samples, etc.

Package context

I have been working with images of cervical cells stained with AgNOR since January/2020 for my master thesis. The results of my thesis you can find at CCAgT-benchmarks. In general, the objective of the thesis it's automatize the principal part to help at the diagnostic/prognostic of these cells. Therefore, I also have developed some codes to preprocess or just to help in the use of this dataset.

These codes to work with the dataset will be available at this package.

Contents

  1. Links to download the dataset
  2. What is this dataset like?
  3. Examples of use of this package

Links to download the CCAgT dataset

  1. Version 1.1 - drive or UFSC repository
  2. Version 2.1 (will be available soon) - Mendeley data

What is this dataset looks like?

Explanations and examples around the >=2.0 version of the dataset. If you want to use older versions of the dataset, you will need to make some modifications to the data directory organizations, or things like that.

This is a computer vision dataset, created by some collaborators from different departments at Universidade Federal de Santa Catarina (UFSC). The dataset contains images annotated/labelled for semantic segmentation and others. The annotation tool is labelbox. In the data repositories will the images, masks (semantic segmentation) and COCO annotations for object detection. The codes to convert annotations from labelbox format to others will be in this package.

Each slide can have some differences in the stain coloration, at figure 1 can be seen an image created from different images of different slides.

Image sample created from samples from different slides

In directory ./data/samples/images/ can be seen the original images of each tile from different slides/patients. The dataset present a wide variety of colors, texture, nuclei format, and others for the cells nuclei, this variety depends on different factors as: Type of lesion, stain process, sample acquisition, sensor/microscopy setup for image acquisition and others.

The dataset at version 1.x has 3 categories annotated, and at version 2.x will have 7 categories. But, the principal objective to help at diagnostic/prognostic of these samples is to detect/identify/measure the Nucleolus Organizer Regions (NORs) inside each nucleus. The NORs (the black dots/parts inside the nuclei) were labeled as two different categories: Satellite and clusters.

At figure 2, has an example with two highlighted nuclei. The nucleus at left (black highlighted) it's a nucleus with three clusters. The nucleus at right side (gray highlighted) it's a nucleus with one cluster (the black dot at the top of the nuclei) and two satellites (the other two black dots).

Image from a tile highlighting two nuclei

For more explanations about the dataset, see the dataset pages, or their papers.

Examples of use

Converter

To use the dataset along different approaches, different “formats” are required. This module will provide the correct transformation between the format provided by the annotation tool (LabelBox) and the current state-of-the-art formats (e.g. COCO). It will also make it possible to work with the data in DataFrame format, which I consider to be the easiest way to perform the manipulation of these annotations. The annotations dataframe format is not recommended or built for use in any specific deep learning library or approach. It was built only for manipulation of the dataset, to facilitate conversions between different formats, perform analysis, and internal use of this library.

$ CCAgT-converter -h  # to show help message

Labelbox to COCO format

$ CCAgT-converter labelbox_to_COCO -t OD -r ./data/samples/sanitized_sample_labelbox.json\
                                         -a ./data/samples/CCAgT_dataset_metadata.json\
                                         -o ./data/samples/out/CCAgT_COCO_OD.json

Labelbox to CCAgT format

$ CCAgT-converter labelbox_to_CCAgT -r ./data/samples/sanitized_sample_labelbox.json \
                                    -a ./data/samples/CCAgT_dataset_metadata.json \
                                    -o ./data/samples/out/CCAgT.parquet.gzip\
                                    -p True

CCAgT to masks (categorical masks for semantic segmentation)

$ CCAgT-converter generate_masks -l ./data/samples/out/CCAgT.parquet.gzip\
                                 -o ./data/samples/masks/semantic_segmentation/\
                                 --split-by-slide

CCAgT to Panoptic segmentation COCO

$ CCAgT-converter CCAgT_to_COCO  -t PS -l ./data/samples/out/CCAgT.parquet.gzip\
                                       -o ./data/samples/masks/panoptic_segmentation\
                                       --out-file ./data/samples/out/CCAgT_COCO_PS.json

visualization

Module responsible for assisting in the display or creation of figures from the dataset.

usage: CCAgT-visualization -h  # to show help message

Show images with boxes

$ CCAgT-visualization show -l ./data/samples/out/CCAgT.parquet.gzip\
                           -a ./data/samples/CCAgT_dataset_metadata.json\
                           -d ./data/samples/images/

Show images and mask

$ CCAgT-visualization show -t image-and-mask\
                           -l ./data/samples/out/CCAgT.parquet.gzip\
                           -a ./data/samples/CCAgT_dataset_metadata.json\
                           -d ./data/samples/images/\
                           -m ./data/samples/masks/semantic_segmentation/

Show image with boxes and mask

$ CCAgT-visualization show -t image-with-boxes-and-mask\
                           -l ./data/samples/out/CCAgT.parquet.gzip\
                           -a ./data/samples/CCAgT_dataset_metadata.json\
                           -d ./data/samples/images/\
                           -m ./data/samples/masks/semantic_segmentation/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CCAgT_utils-0.1.5a0.tar.gz (37.9 kB view details)

Uploaded Source

Built Distribution

CCAgT_utils-0.1.5a0-py2.py3-none-any.whl (44.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file CCAgT_utils-0.1.5a0.tar.gz.

File metadata

  • Download URL: CCAgT_utils-0.1.5a0.tar.gz
  • Upload date:
  • Size: 37.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.11

File hashes

Hashes for CCAgT_utils-0.1.5a0.tar.gz
Algorithm Hash digest
SHA256 3e3372b5e3cff5f53d2fde7e75b3a7fe3518f83b287eed876287d88bba432cad
MD5 7c8cb59e1736e2db9ea5059c6103e077
BLAKE2b-256 fb2fbcbac548090a77c942511df102b9809eb794518e5c797d6f2de1cb7e492f

See more details on using hashes here.

File details

Details for the file CCAgT_utils-0.1.5a0-py2.py3-none-any.whl.

File metadata

  • Download URL: CCAgT_utils-0.1.5a0-py2.py3-none-any.whl
  • Upload date:
  • Size: 44.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.11

File hashes

Hashes for CCAgT_utils-0.1.5a0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f145ec2abe8da373159cf633501b539ccf671f79e06b07fb80a3dc1871494004
MD5 c2beb69b6b949e15f8c6d7e43da202b7
BLAKE2b-256 b1bf0d5f6c95d5a4890502e7e8a6d84b868d8845d3432292f539a745d0a29bf2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page