Skip to main content

Score cells for centrioles in IF data

Project description

CenFind

A command line interface to score cells for centrioles.

Introduction

cenfind is a command line interface to detect and assign centrioles in immunofluorescence images of human cells. Specifically, it orchestrates:

  • the z-max projection of the raw files;
  • the detection of centrioles;
  • the detection of the nuclei;
  • the assignment of the centrioles to the nearest nucleus.

Installation

  1. Install python via pyenv
  2. Download and set up 3.9.5 as local version
  3. Set up Python interpreter
pyenv local 3.9.5
pyenv global 3.9.5
  1. Create a virtual environment for CenFind
python -m venv venv-cenfind
source venv-cenfind/bin/activate
  1. Check that cenfind's programs are correctly installed by running:
cenfind squash --help

Basic usage

Before scoring the cells, you need to prepare the dataset folder. cenfind assumes a fixed folder structure. In the following we will assume that the .ome.tif files are all immediately in raw/. Each file is a z-stack field of view (referred to as field, in the following) containing 4 channels (0, 1, 2, 3). The channel 0 contains the nuclei and the channels 1-3 contains centriolar markers.

<project_name>/
└── raw/
  1. Run prepare to initialise the dataset folder with a list of output folders:
cenfind prepare /path/to/dataset --splits 1 2 --projection_suffix _max
usage: CENFIND prepare [-h] [--projection_suffix PROJECTION_SUFFIX] [--splits SPLITS [SPLITS ...]] dataset

positional arguments:
  dataset               Path to the dataset

optional arguments:
  -h, --help            show this help message and exit
  --projection_suffix PROJECTION_SUFFIX
                        Suffix indicating projection, e.g., `_max` or `Projected`, empty if not specified (default: )
  --splits SPLITS [SPLITS ...]
                        Write the train and test splits for continuous learning using the channels specified (default: None)
  1. Run squash with the path to the project folder and the suffix of the raw files. projections/ is populated with the max-projections *_max.tif files.
cenfind squash path/to/dataset
usage: CENFIND squash [-h] path

positional arguments:
  path        Path to the dataset folder
  1. Run score with the arguments source, the index of the nuclei channel (usually 0 or 3), the channel to score and the path to the model. You need to download it from https://figshare.com/articles/software/Cenfind_model_weights/21724421
cenfind score /path/to/dataset /path/to/model/ --channel_nuclei 0 --channel_centrioles 1 2 3
usage: CENFIND score [-h] --channel_nuclei CHANNEL_NUCLEI --channel_centrioles CHANNEL_CENTRIOLES [CHANNEL_CENTRIOLES ...] [--vicinity VICINITY] [--factor FACTOR] [--cpu] dataset model

positional arguments:
  dataset               Path to the dataset
  model                 Absolute path to the model folder

optional arguments:
  -h, --help            show this help message and exit
  --channel_nuclei CHANNEL_NUCLEI
                        Channel index for nuclei segmentation, e.g., 0 or 3 (default: None)
  --channel_centrioles CHANNEL_CENTRIOLES [CHANNEL_CENTRIOLES ...]
                        Channel indices to analyse, e.g., 1 2 3 (default: None)
  --vicinity VICINITY   Distance threshold in micrometer (default: -5 um) (default: -5)
  --factor FACTOR       Factor to use: given a 2048x2048 image, 256 if 63x; 2048 if 20x: (default: 256)
  --cpu                 Only use the cpu (default: False)
  1. Check that the predictions are satisfactory by looking at the folders visualisations/ and statistics/

  2. If you are interested in categorising the number of centrioles, run cenfind analyse path/to/dataset --by <well> the --by option is interesting if you want to group your scoring by well, if the file names obey to the rule <WELLID_FOVID>.

usage: CENFIND analyse [-h] --by BY dataset

positional arguments:
  dataset     Path to the dataset

optional arguments:
  -h, --help  show this help message and exit
  --by BY     Grouping (field or well) (default: None)

Running cenfind score in the background

When you exit the shell, running programs receive the SIGHUP, which aborts them. This is undesirable if you need to close your shell for some reasons. Fortunately, you can make your program ignore this signal by prepending the program with the nohup command. Moreover, if you want to run your program in the background, you can append the ampersand &. In practice, run nohup cenfind score ... & instead of cenfind score ....

The output will be written to the file nohup.out and you can peek the progress by running tail -F nohup.out, the flag -F will refresh the screen as the file is being written. Enter Ctrl-C to exit the tail program.

If you want to kill the program score, run jobs and then run kill <jobid>. If you see no jobs, check the log nohup.out; it can be done or the program may have crashed, and you can check the error there.

Evaluating the quality of the model on a new dataset

The initial model M is fitted using a set of five representative datasets, hereafter referred to as the standard datasets (DS1-5). If your type of data deviates too much from the standard dataset, M may perform less well.

Specifically, when setting out to score a new dataset, you may be faced with one of three situations, as reflected by the corresponding F1 score (i.e., 2TP/2TP+FN+FP, TP: true positive, FP: false positive; FN: false negative): (1) the initial model (M) performs well on the new dataset (0.9 ≤ F1 ≤ 1); in this case, model M is used; (2) model M performs significantly worse on the new dataset (0.5 ≤ F1 < 0.9); in this case, you may want to consider retraining the model (see below); (3) the model does not work at all (0 ≤ F1 < 0.5); such a low F1value probably means that the features of the data set are too distant from the original representative data set to warrant retraining starting from M.

Before retraining a model (2), verify once more the quality of the data, which needs to be sufficiently good in terms of signal over noise to enable efficient learning. If this is not the case, it is evident that the model will not be able to learn well. If you, as a human being, cannot tell the difference between a real focus and a stray spot using a single channel at hand (i.e., not looking at other channels), the same will hold for the model.

To retrain the model, you first must annotate the dataset, divide it randomly into training and test sets (90 % versus 10 % of the data, respectively). Next, the model is trained with the 90 % set, thus generating a new model, M*. Last, you will evaluate the gain of performance on the new dataset, as well as the potential loss of performance on the standard datasets.

Detailed training procedure:

  1. Split the dataset into training (90%) and test (10%) sets, each containing one field of view and the channel to use. This helps trace back issues during the training and renders the model fitting reproducible.
</code></pre>
<ol start="2">
<li>Label all the images present in training and test sets using Labelbox. To upload the images, please create the vignettes first and then upload them once you have a project set up.</li>
</ol>
<pre lang="shell"><code>cenfind vignettes /path/to/dataset
cenfind upload /path/to/dataset --env /path/to/.env
  1. Save all foci coordinates (x, y), origin at top-left, present in one field of view as one text file under /path/to/dataset/annotation/centrioles/ with the naming scheme <dataset_name>_max_C<channel_index>.txt.
cenfind download dataset-name --env /path/to/.env
  1. Evaluate the newly annotated dataset using the model M by computing the F1 score. evaluate dataset model
usage: CENFIND evaluate [-h] [--performances_file PERFORMANCES_FILE] [--tolerance TOLERANCE] --channel_nuclei CHANNEL_NUCLEI --channel_centrioles CHANNEL_CENTRIOLES [CHANNEL_CENTRIOLES ...]
                        [--vicinity VICINITY]
                        dataset model

positional arguments:
  dataset               Path to the dataset folder
  model                 Path to the model

optional arguments:
  -h, --help            show this help message and exit
  --performances_file PERFORMANCES_FILE
                        Path of the performance file, STDOUT if not specified (default: None)
  --tolerance TOLERANCE
                        Distance in pixels below which two points are deemed matching (default: 3)
  --channel_nuclei CHANNEL_NUCLEI
                        Channel index for nuclei segmentation, e.g., 0 or 3 (default: None)
  --channel_centrioles CHANNEL_CENTRIOLES [CHANNEL_CENTRIOLES ...]
                        Channel indices to analyse, e.g., 1 2 3 (default: None)
  --vicinity VICINITY   Distance threshold in micrometer (default: -5 um) (default: -5)
  1. If the performance is poor (i.e., F1 score < 0.9), fit a new model instance, M*, with the standard dataset plus the new dataset (90% in each case).
  2. Test performance of model M* on the new data set; hopefully the F1 score will now be ≥ 0.9 (if not: consider increasing size of annotated data).
  3. Test performance of model M* on the standard datasets; if performance of F1* ≥ F1, then save M* as the new M (otherwise keep M* as a separate model for the new type of data set).

Internal API

cenfind consists of two core classes: Dataset and Field.

A Dataset represents a collection of related fields, i.e., same pixel size, same channels, same cell type.

It should:

  • return the name
  • iterate over the fields,
  • construct the file name for the projections and the z-stacks
  • read the fields.txt
  • write the fields.txt file
  • set up the folders projections, predictions, visualisations and statistics
  • set and get the splits

A Field represents a field of view and should:

  • construct file names for projections, annotation
  • get Dataset
  • load the projection as np.ndarray
  • load the channel as np.ndarray
  • load annotation as np.ndarray
  • load mask as np.ndarray

Using those two objects, cenfind should

  • detect centrioles (data, model) => points,
  • extract nuclei (data, model) => contours,
  • assign centrioles to nuclei (contours, points) => pairs
  • outline centrioles and nuclei (data, points) => image
  • create composite vignettes (data) => composite_image
  • flag partial nuclei (contours, tolerance) => contours
  • compare predictions with annotation (points, points) => metrics_namespace

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cenfind-0.11.3.tar.gz (34.3 kB view details)

Uploaded Source

Built Distribution

cenfind-0.11.3-py3-none-any.whl (42.6 kB view details)

Uploaded Python 3

File details

Details for the file cenfind-0.11.3.tar.gz.

File metadata

  • Download URL: cenfind-0.11.3.tar.gz
  • Upload date:
  • Size: 34.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.9.16 Linux/5.15.0-1033-azure

File hashes

Hashes for cenfind-0.11.3.tar.gz
Algorithm Hash digest
SHA256 167ee07395d59c6a91120d0a41ac87ac5597bc433ddcfe1ff505ac33cd64488f
MD5 085da55f04d6afb8a250765315f7135d
BLAKE2b-256 96126820e7c09fba39393532ef4cf4b2f922dbf04d78120e9110850c4f6bf841

See more details on using hashes here.

File details

Details for the file cenfind-0.11.3-py3-none-any.whl.

File metadata

  • Download URL: cenfind-0.11.3-py3-none-any.whl
  • Upload date:
  • Size: 42.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.9.16 Linux/5.15.0-1033-azure

File hashes

Hashes for cenfind-0.11.3-py3-none-any.whl
Algorithm Hash digest
SHA256 25a50a4d027dcafaf854fc94df469c6ef8f23d23c54c9100269a4b3d45a73194
MD5 c07eb87ea6f0b210bf90219f3edd83cb
BLAKE2b-256 9067beea2bff4fd84e7d308d1305ad25f155d3b67300e73519fdc39fbcc80937

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page