Package for pre- and post-processing of images and data for working with ilastik-software

These details have not been verified by PyPI

Project links

Project description

caactus

caactus (cell analysis and counting tool using ilastik software) is a collection of python scripts to provide a streamlined workflow for ilastik-software, including data preparation, processing and analysis. It aims to provide biologists with an easy-to-use tool for counting and analyzing cells from a large number of microscopy pictures.

workflow

Introduction

The goal of this script collection is to provide an easy-to-use completion for the Boundary-based segmentation with Multicut-workflow in ilastik. This workflow allows for the automatization of cell-counting from messy microscopic images with different (touching) cell types for biological research. For easy copy & paste, commands are provided in grey code boxes with one-click copy & paste.

Installation

Install miniconda, create an environment and install Python and vigra

Download and install miniconda for your respective operating system according to the instructions.
- Miniconda provides a lightweight package and environment manager. It allows you to create isolated environments so that Python versions and package dependencies required by caactus do not interfere with your system Python or other projects.
Once installed, create an environment for using caactus with the following command from your cmd-line
```
conda create -n caactus-env -c pytorch -c conda-forge python=3.12 pytorch vigra h5py
```

Install caactus

Activate the caactus-env from the cmd-line with
```
conda activate caactus-env
```
To install caactus plus the needed dependencies inside your environment, use
```
pip install caactus
```
During the below described steps that call the caactus-scripts, make sure to have the caactus-env activated.

Install ilastik

Download and install ilastik for your respective operating system.

[!NOTE] We developed the pipeline on ilastik 1.4.0. For optimal user experience, we recommend installing ilastik 1.4.0. Scroll down to "Previous stable versions" on the ilastik download webpage.

Quick Overview of the workflow

Below is a short version of the steps performed. For more detail, please consult Detailed Description of the Workflow.

Culture organism of interest in 96-well plate
Acquire images of cells via microscopy.
Create project directory
Rename Files with caactus.
Convert files to HDF5 Format with the caactus.
Train a pixel classification model in ilastik for training and later run it batch-mode.
Train a boundary-based segmentation with Multicut model in ilastik for training and later run it batch-mode.
Remove the background from the *_Multicut Segmentation.h5files with caactus.
Train a object classification model in ilastik for and later run it batch-mode.
Pool all csv-tables from the individual images into one global table with caactus.

output generated:
- df_clean.csv

Summarize the data with caactus.

output generated:
- a) df_summary_complete.csv = .csv-table containing also not usable category,
- b) df_refined_complete.csv = .csv-table without not usable category",
- c) counts.csv dataframe used in PlnModelling
- d) stacked bar graph (barchart.png)

Model the count data with caactus

output generated:
- a) correlation_circle.png
- b) pca_plot.png

[!NOTE] Power users may directly edit the config.toml and run the scripts from the cmd-line. For instructions, go to 7.1-7.7.

Sample Dataset

a sample dataset to quickly test the workflow can be accessed via zenodo
to showcase the functionalities, the ilastik steps have been pretrained. Use caactus in batch-modes.

[!IMPORTANT] go to 8.1-8.10 for a detailed tutorial with the sample data set

Detailed Description of the Workflow

1. Culturing

Culture your cells in a flat bottom plate of your choice and according to the needs of the organisms being researched.

2. Image acquisition

In your respective microscopy software environment, export the images of interest to .tif-format.
For the workflow and file conversion steps, caactus currently supports grayscale (1-channel) and RGB (3-channel) images in .tif-format.
From the image metadata, copy the pixel size.

[!NOTE] We recommend exporting the images without scale bars, because they will introduce distraction for the classifier during the annotation.

3. Data Preparation

3.1 Create Project Directory

For portability of the ilastik projects create the directory in the following structure:\

[!NOTE] The directory structure below already includes examples of resulting files in each sub-directory.

This allows you to copy an already trained workflow and use it multiple times with new datasets, when relative paths are enabled.

project_directory = Main folder  
├── 1_pixel_classification.ilp  
├── 2_boundary_segmentation.ilp  
├── 3_object_classification.ilp
├── renaming.csv
├── config.toml
├── 0_1_original_tif_training_images
  ├── training-1.tif
  ├── training-2.tif
  ├── ...
├── 0_2_original_tif_batch_images
  ├── image-1.tif
  ├── image-2.tif
  ├── ..
├── 0_3_batch_tif_renamed
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-1.tif
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-2.tif
  ├── ..
├── 1_images
  ├── training-1.h5
  ├── training-2.h5
  ├── ...
├── 2_probabilities
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-1-data_Probabilities.h5
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-2-data_Probabilities.h5
  ├── ...
├── 3_multicut
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-1-data_Multicut Segmentation.h5
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-2-data_Multicut Segmentation.h5
  ├── ...
├── 4_objectclassification
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-1-data_Object Predictions.h5
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-1-data_table.csv
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-2-data_Object Predictions.h5
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-2-data_table.csv
  ├── ...
├── 5_batch_images
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-1.h5
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-2.h5
  ├── ...
├── 6_batch_probabilities
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-1-data_Probabilities.h5
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-2-data_Probabilities.h5
  ├── ...
├── 7_batch_multicut
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-1-data_Multicut Segmentation.h5
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-2-data_Multicut Segmentation.h5
  ├── ...
├── 8_batch_objectclassification
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-1-data_Object Predictions.h5
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-1-data_table.csv
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-2-data_Object Predictions.h5
  ├── strain-xx_day-yymmdd_condition1-yy_timepoint-zz_parallel-2-data_table.csv
  ├── ...
├── 9_data_analysis

3.2 Getting started

Open the caactus Graphical User Interface (GUI) by opening the command line in Unix or Anaconda Powershell/Prompt in Windows.
Make sure you have the caactus environment activated:

conda activate caactus-env

Type caactus and hit Enter to start the GUI:

caactus

3.3 The Graphic User Interface (GUI)

The graphic user interface is structured in four parts.

3.3.1 Global Settings

global_settings

At the top, enter the path to your Main Folder (use the Browse button or type/ copy&paste the full path).
Select Mode between training and batch.
- training refers to all steps during annotation of the ilastik classifiers
- batch refers to all steps performed on large datasets with ready trained ilastik models used in batch mode and subsequent data analysis.
Set shared analysis parameters once in Global Settings: Pixel Size, Variable Names, Class Order, Color Mapping. EUCAST-specific settings can be expanded below.
When working with a EUCAST dataset, edit EUCAST settings from the dropdown menu.

3.3.2 Pre-Processing

pre_process

The workflow is shown as a numbered list of steps. When in training or batch modes, select the respective mode from the dropdown Global Settings.
Click Run to execute a step.
Each step includes description in ? Help pop-up.
Processing messages appear in the log panel at the bottom.
The output can be accessed in the respective subdirectory of your main folder.

3.3.3 ilastik

ilastiks_steps

For ilastik steps, detailed step-by-step instructions are included.
When in training or batch modes, select the respective mode from the dropdown Global Settings.
Click Run to execute a Backgrground processing, Processing messages appear in the log panel at the bottom.
All other steps have to be performed in ilastik.
Each step includes description in ? Help pop-up.

3.3.4 Data analysis

data_analysis

Data analysis steps are performed with the results of batch-processing.
Click Run to execute a step and ? Help for pop-up window instructions.
Use 9. instead of 8. when working with a EUCAST dataset.

4. Training

To facilitate cross-platform reusability of the ilastik models, make sure to store Raw Data, Probabilities and Prediction Maps in Relative Links. This allows for portability of the models to other storage locations.

relative_link

In case absolute file path is selected, right click on the location and select edit properties under storage the path logic can be modified

relative_storage

4.1. Selection of Training Images and Conversion

4.1.1 Selection of Training data

select a set of images that represent the different experimental conditions best
store them in 0_1_original_tif_training_images

4.1.2 Conversion

In the caactus GUI, select training from the dropdown menu in Global Settings
Find 2. Tif to H5.

The script converts .tif files to .h5 format for better performance in ilastik.

Click Run.

4.2. Pixel Classification

When first training a pixel classification model in ilastik, open ilastik.
Create a new project and select Pixel Classification as the workflow.
Save it as 1_pixel_classification.ilp inside the main project directory.
Under Raw Data, add the *.h5 files from 1_images folder.
Feature selection. Select the features you want to use for training. It is recommended to use all features.
For working with neighbouring / touching cells, it is suggested to create three classes: 0 = interior, 1 = background, 2 = boundary (This follows python's 0-indexing logic where counting is started at 0).

pixel_classes

Annotate the classes by drawing on the images.
Export the Predictions. In prediction export change the settings to

Convert to Data Type: integer 8-bit
Renormalize from 0.00 1.00 to 0 255

File:

{dataset_dir}/../2_probabilities/{nickname}_{result_type}.h5

export_prob

Click OK.
Click Export All.
The output will be saved as *_Probabilities.h5 files in the 2_probabilities folder.

For more information, consult the documentation for pixel classification with ilastik.

4.3 Boundary-based Segmentation with Multicut

When first training a boundary-based Segmentation model in ilastik, open ilastik.
Create a new project and select Boundary-based Segmentation with Multicut as the workflow.
Save it as 2_boundary_segmentation.ilp inside the main project directory.
Under Raw Data, add the .h5 files from 1_images folder.
Under Probabilities, add the data_Probabilities.h5 files from 2_probabilities folder.
in DT Watershed, use the input channel the corresponds to the order you used under project setup (in this case input channel = 2).

watershed

Annotate the edges by clicking on the edges between cells. Annotate the background by clicking on the background.
Export the Multicut Segmentation. In prediction export change the settings to

Convert to Data Type: integer 8-bit
Renormalize from 0.00 1.00 to 0 255
Format: compressed hdf5

File:

{dataset_dir}/../3_multicut/{nickname}_{result_type}.h5

export_multicut

Click OK.
Click Export All.
The output will be saved as *_Multicut Segmentation.h5 files in the 3_multicut folder.

For more information follow the documentation for boundary-based segmentation with Multicut.

4.4 Background Processing

For further processing in object classification, the background must be removed from the multicut data sets. This script sets the numerical value of the largest region to 0, making it transparent in the next step. The operation runs in-place on all *_Multicut Segmentation.h5 files in 3_multicut/.

In the caactus GUI, find 5. Background Processing.
Make sure training is still selected in Mode under Global Settings.
Click Run.

4.5. Object Classification

When first training an Object classification model in ilastik, open ilastik.
Create a new project and select Object Classification [Inputs: Raw, Data, Pixel Prediction Map] as the workflow.
Save it as 3_object_classification.ilp inside the main project directory.
Under Raw Data, add the .h5 files from 1_images folder.
Under Segmentation Image, add the *_Multicut Segmentation.h5 files from 3_multicut folder.
Define your cell types plus an additional category for not usable objects, e.g. cell debris and cut-off objects on the side of the images.

object_label

[!NOTE] Default class names in caactus are resting, swollen, germling, hyphae, notusable (and mycelium for the EUCAST workflow). You are welcome to change them — just make sure to also update the names in the caactus GUI when performing the analysis steps below.

Annotate the edges by clicking on the edges between cells. Annotate the background by clicking on the background.
Export the Object_Predictions. Under 4. Object Information Export:
- From the dropdown select Object Predictions

object_pred

In Choose Export Image Settings change settings to

Convert to Data Type: integer 8-bit
Renormalize from 0.00 1.00 to 0 255
Format: compressed hdf5

File:

{dataset_dir}/../4_objectclassification/{nickname}_{result_type}.h5

export_multicut

Export the Object data_table.csv-files In Configure Feature Table Export General change settings to

format .csv and output directory File:

{dataset_dir}/../4_objectclassification/{nickname}.csv

select your features of interest for exporting

Click OK.
Click Export All.
The output will be saved as *_Object Predictions.h5 files and *_table.csv in the 4_objectclassification folder.

For more information follow the documentation for object classification.

5. Batch Processing

Once you have successfully trained all three ilastik models, you are ready to process large image datasets with the caactus pipeline.

store the images you want to process in the 0_2_original_tif_batch_images directory
Perform steps 4.1 to 4.5 in batch mode, as explained in detail below (5.1 to 5.5).
Select Mode batch in the dropdown menu in Global settings in the caactus GUI.

For more information, follow the documentation for batch processing

5.1 Rename Files

Rename the .tif-files so that they contain information about your cells and experimental conditions

Create a csv-file that contains the information you need in columns. Each row corresponds to one image. Follow the same order as your images files are stored in the respective directory (alphabetically).

The script will rename your files in the following format columnA-value1_columnB-value2_columnC_etc.tif eg. as seen in the example below picture 1 (well A1 from our plate) will be named

strain-ATCC11559_date-20241707_timepoint-6h_biorep-A_techrep-1.tif

[!CAUTION] Do not use underscores (_) or dashes (-) in column names or values — these characters are used as delimiters in the new file names.

[!IMPORTANT] The only hardcoded column names required are biorep and techrep. They are needed in downstream analysis for calculating averages.

In the caactus GUI under Pre-Processing, find 1. Renaming and click Run.

[!TIP] After renaming, we recommend deleting the contents of 0_2_original_tif_batch_images to save disk space.

5.2 Conversion

In the caactus GUI, find 2. Tif to H5. Select batch from the dropdown menu.

The script converts .tif files to .h5 format for better performance in ilastik. 2. Click Run.

[!TIP] After converting, we recommend deleting the contents of 0_3_batch_renamed to save disk space.

5.3 Batch Processing Pixel Classification

In the caactus GUI, find 3. Pixel Classification, click ? Help for the full ilastik instructions. Summary:

Open ilastik.
Open your trained pixel classification project (e.g. 1_pixel_classification.ilp).

[!CAUTION] DO NOT CHANGE anything in 1. Input Data, 2. Feature Selection, or 3. Training when running Batch Processing!

Under 4. Prediction Export:
- Select Probabilities from the dropdown.
- Click Choose Export Image Settings and set the output file path at File:

{dataset_dir}/../6_batch_probabilities/{nickname}_{result_type}.h5

batch_pixel

Click OK
Go to 5. Batch processing tab
Under Raw data, add the .h5 files from 5_batch_images folder.
Now click Process all files.
The output will be saved as *_Probabilities.h5 files in the output folder (6_batch_probabilities).

5.4 Batch Processing Multicut Segmentation

In the caactus GUI, find 4. Boundary Segmentation, click ? Help for the full ilastik instructions. Summary:

Open ilastik.
Open your trained Boundary Segmentation project (e.g. 2_boundary_segmentation.ilp).

[!CAUTION] DO NOT CHANGE anything in 1. Input Data, 2. DT Watershed, or 3. Training and Multicut when running Batch Processing!

[!NOTE] The *_Multicut Segmentation.h5 output files are generated by ilastik in this step — they do not exist beforehand.

Under 4. Data Export,

batch_multicut

Click Choose Export Image Settings and set the output path at File:
```
{dataset_dir}/../7_batch_multicut/{nickname}_{result_type}.h5
```

Click OK
Go to 5. Batch processing.
Under Raw data, add the .h5 files from 5_batch_images folder.
Under Probabilities, add the data_Probabilities.h5 files from 6_batch_probabilities folder.

batch_multicut

Go to 5. Batch Processing and click Process all files.
The output will be saved as *_Multicut Segmentation.h5 files in the output folder (7_batch_multicut).

5.5 Background Processing

In the caactus GUI, find 5. Background Processing, click ? Help for the description.
Click Run. This removes the background from all *_Multicut Segmentation.h5 files in 7_batch_multicut/ by setting the largest region value to 0 (transparent in ilastik).

5.6 Batch processing Object classification

In the caactus GUI, find 6. Object Classification, click ? Help for the full ilastik instructions. Summary:

Open ilastik.
Open your trained object classification project (3_object_classification.ilp).

[!CAUTION] DO NOT CHANGE anything in 1. Input Data, 2. Object Feature Selection, or 3. Object Classification when running Batch Processing!

Under 4. Object Information Export:
- From the dropdown select Object Predictions (default).
- Click Choose Export Image Settings and set the output path at File:

{dataset_dir}/../8_batch_objectclassification/{nickname}_{result_type}.h5

object_image

Under "4. Object Information Export", choose "Configure Feature Table Export" with the following settings:
In Configure Feature Table Export General choose format .csv and change output directory to:

{dataset_dir}/../8_batch_objectclassification/{nickname}.csv

Choose Features to choose the Feature you are interested in exporting

feature_features

Click OK
Go to 5. Batch Processing tab
Under Raw data, add the .h5 files from 5_batch_images folder.
Under Segmentation Image, add the data_Multicut Segmentation.h5 files from 7_batch_multicut folder.
Go to 5. Batch Processing and click Process all files.
The output will be saved as *_Object Predictions.h5 files and *_table.csv in the output folder (8_batch_objectclassification).

6. Post-Processing and Data Analysis

[!NOTE]
Please be aware, the last two scripts, summary_statistics.py and pln_modelling.py at this stage are written for the analysis and visualization of two independent variables. They take the result of the batch-processing steps as input.

In the caactus GUI, set Variable Names, Class Order, Color Mapping and Pixel Size (µm) in Global Settings.

6.1 Merging Data Tables and Table Export

The next script will combine all tables from all images into one global table for further analysis. Additionally, the information stored in the file name will be added as columns to the dataset.

Find 7. CSV Summary and click Run.
The output generated will be df_clean.csv in 9_data_analysis
This spreadsheet now has all feature tables that are the output of 5.6 Object classification united in one spreadsheet.

[!TIP]
Technically from this point on, you can continue to use whatever software / workflow that is easiest for you for subsequent data analysis (e.g. GraphPadPrism, EXCEL etc.)

6.2 Creating Summary Statistics

This script processes EUCAST data and generates summary statistics and a stacked bar plot of predicted classes cell categories.
If working with EUCAST antifungal susceptibility testing, use the 9. EUCAST Summary Statistics
For the stacked bar plot, it groups data by the two variables that you enter.
It computes the average count and percentage of each predicted class, across replicates (technical and biological), for each combination of the two grouping variables.
It visualizes the distribution in stacked bar plots of classes across different conditions.
The first variable you enter will be displayed on the x-axis (e.g. incubation temperature), and the second variable will be used for faceting (e.g. timepoint).
This will create separate subplots for each level of that variable.
The plot will show the percentage distribution of predicted classes for each condition, allowing you to compare how the classes are distributed across different experimental conditions defined by the two grouping variables.
The colors of the bars will correspond to the predicted classes, as defined in your color mapping.
By default the IBM color-blind friendly palette is used, but you can customize the colors by providing the HEX color code.

Find 8. Summary Statistics and click Run.
Output:
- a) df_summary_complete.csv — full table including "not usable" category
- b) df_refined_complete.csv — table without "not usable" category
- c) counts.csv — count data used for PLN modelling
- d) barchart.png — stacked bar chart

6.3 PLN Modelling

This script runs ZIPln modelling on input data with dynamic design and generates PCA visualizations and a correlation circle plot.
The two grouping variables you enter will be used in the model formula and for visualizing the PCA results.
They will be combined into a single factor for the model, and the PCA plot will show the latent variable projections colored by this combined category.
The correlation circle plot will show how the original variables relate to the latent dimensions, helping you interpret the PCA results in terms of the original grouping variables.

[!WARNING] The limit of categories displayed in the PCA plot is n=15.

In the caactus GUI, make sure Variable Names and Class Order are set correctly in Global Settings.
Find step 10 – PLN Modelling and click Run.
Output:
- a) correlation_circle.png
- b) pca_plot.png

[!NOTE] Variable Names and Class Order are shared with Summary Statistics — set them once in Global Settings.

7. Running caactus from the command line

7.1 Setup config.toml-file

copy config/config.toml to your working directory and modify it as needed.
the caactus scripts are setup for pulling the information needed for running from the file
- CAVE: for Windows users make sure to change the backslash from /path/to/config.toml to \path\to\config.toml, when copying the path to your working directory
open the command line (for Windows: Anaconda Powershell) and save the path to your project file to a variable
- whole command UNIX:
```
p = "\path\to\config.toml"
```
whole command Windows:
```
$p = "\path\to\config.toml"
```
enter "-c" and enter path to config.toml
enter "-m" and choose "training" or "batch" to switch between modes

7.2 Conversion

call the tif2h5py script from the cmd prompt to transform all .tif-files to .h5-format.
whole command UNIX:
```
tif2h5py -c "$p" -m training
```
whole command Windows:
```
tif2h5py.exe -c $p -m training
```
For batch processing enter "-m batch" for batch mode.
- whole command UNIX:
```
tif2h5py -c "$p" -m batch
```
- whole command Windows:
```
tif2h5py.exe -c $p -m batch
```

7.3 Background Processing

call the background-processing script from the cmd prompt

whole command UNIX:

background_processing -c "$p" -m training

whole command Windows:

background_processing.exe -c $p -m training

For batch processing enter "-m batch" for batch mode
- whole command Unix:
```
background_processing -c "$p" -m batch
```
- whole command Windows:
```
background_processing.exe -c $p -m batch
```

7.4 Rename Files

Call the rename script from the cmd prompt to rename all your original .tif-files to their new name.
- whole command Unix:
```
renaming -c "$p"
```
- whole command Windows:
```
renaming.exe -c $p
```

7.5 CSV-Summary

call the csv_summary.py script from the cmd prompt
- whole command Unix:
```
csv_summary -c "$p"
```
- whole command Windows:
```
csv_summary.exe -c $p
```

7.6 Creating Summary Statistics

call the summary_statistics.py script from the cmd prompt
- whole command Unix:
```
summary_statistics -c "$p"
```
- whole command Windows:
```
summary_statistics.exe -c $p
```
if working with EUCAST antifungal susceptibility testing, call
- whole command Unix:
```
summary_statistics_eucast -c "$p"
```
- whole command Windows:
```
summary_statistics_eucast -c $p
```

7.7 PLN Modelling

call the pln_modelling.py script from the cmd prompt
whole command Unix:
```
pln_modelling -c "$p"
```
whole command Windows:
```
pln_modelling.exe -c $p
```

8. Tutorial

8.1 Download Sample Data

Go to zenodo to download the sample data.
Unpack the .zip-file into your project folder.
The path to where you unpacked the sample data will be your main folder (e.g. /home/usr/Documents/sampledata_CD6_zenodo).
To showcase the functionalities, the ilastik steps have been pretrained. Use caactus in batch-mode for the following steps. From the dropdown menu in Global settings in the GUI, select batch

[!NOTE] Some subdirectories are intentionally left empty. The tutorial is designed to show how the batch mode works with pretrained models. 0_1_original_tif_training_images stays empty; the other empty subdirectories will be filled as you follow the steps below.

make sure you have caactus installed (see Installation above)
make sure you have the caactus environment activated

conda activate caactus-env

now simply type caactus and hit enter to start the graphical user interface

caactus

We recommend working with two screens. This allows to follow the instructions implemented in the caactus GUI while performing the steps in ilastik and quickly switching back to the caactus steps for fast completion of the pipeline.

8.2 Global Settings

On the top, enter the path to your main folder.

main_folder

Change the default values ['strain', 'timepoint'] to

['condition1','condition2']

change_variable

Set Mode to batch.

mode_batch

8.3 Pre-Processing - Renaming

In your main folder (sampledata_CD6_zenodo), inspect the renaming.csv spreadsheet to see how it is constructed.
In the GUI, go to Pre-Processing 1. Renaming and click Run.
If you click on the dropdown menu Advanced paths, a menu will open that will allow you to change the input and output folders, as well as the name of the renaming file.

renaming

8.4 Pre-Processing - Tif to h5

In the GUI, go to Pre-Processing 2. Tif to h5 and click Run.
If you click on the dropdown menu Advanced paths, a menu will open that will allow you to change the input and output folders.

tif2h5

8.5 Batch Pixel Classification

In the caactus GUI, find 3. Pixel Classification, click ? Help for the full instructions. Summary:

Open ilastik.
Open the pre-trained pixel classification project from the sample data (1_pixel_classification.ilp).

[!CAUTION] DO NOT CHANGE anything in 1. Input Data, 2. Feature Selection, or 3. Training when running Batch Processing!

Under 4. Prediction Export:
- Select Probabilities from the dropdown.
- Click Choose Export Image Settings and set the output path at File:

{dataset_dir}/../6_batch_probabilities/{nickname}_{result_type}.h5

batch_pixel

Click OK
Go to 5. Batch processing tab
Under Raw data, add the .h5 files from 5_batch_images folder.
Now click Process all files.
The output will be saved as _Probabilities.h5 files in the output folder.

8.6 Batch Processing Multicut Segmentation

In the caactus GUI, find 4. Boundary Segmentation, click ? Help for the full instructions. Summary:

In ilastik, open the pre-trained Boundary Segmentation project (2_boundary_segmentation.ilp).

[!CAUTION] DO NOT CHANGE anything in 1. Input Data, 2. DT Watershed, or 3. Training and Multicut when running Batch Processing!

[!NOTE] The *_Multicut Segmentation.h5 files are generated here — they do not exist beforehand.

Under 4. Data Export,

batch_multicut

click Choose Export Image Settings and set the output path at File:
```
{dataset_dir}/../7_batch_multicut/{nickname}_{result_type}.h5
```

Click OK
Go to 5. Batch processing.
Under Raw data, add the .h5 files from 5_batch_images folder.
Under Probabilities, add the data_Probabilities.h5 files from 6_batch_probabilities folder.

batch_multicut

Go to 5. Batch Processing and click Process all files.
The output will be saved as *_Multicut Segmentation.h5 files in the output folder (7_batch_multicut).
Close the 2_boundary_segmentation.ilp project-file in ilastik.

8.7 Batch Background Processing

Switch back to the caactus GUI.
Find 5. Background Processing.
Click Run. The background is now removed and you can continue with object classification in ilastik.

batch_background

8.8 Batch Object Classification

In the caactus GUI, find 6. Object Classification, and click ? Help for the full instructions. Summary:

Switch back to ilastik.
Open your trained object classification project (3_object_classification.ilp).

[!CAUTION] DO NOT CHANGE anything in 1. Input Data, 2. Object Feature Selection, or 3. Object Classification when running Batch Processing!

Under 4. Object Information Export:
- Select Object Predictions from the dropdown.

object_pred

Click Choose Export Image Settings and set the output path at File:

{dataset_dir}/../8_batch_objectclassification/{nickname}_{result_type}.h5

object_image

Under "4. Object Information Export", choose "Configure Feature Table Export" with the following settings:
In Configure Feature Table Export General choose format .csv and change output directory to:

{dataset_dir}/../8_batch_objectclassification/{nickname}.csv

Choose Features to choose the Feature you are interested in exporting.

feature_features

Click OK
Go to 5. Batch Processing tab
Under Raw data, add the .h5 files from 5_batch_images folder.
Under Segmentation Image, add the data_Multicut Segmentation.h5 files from 7_batch_multicut folder.
Go to 5. Batch Processing and click Process all files.
The output will be saved as *_Object Predictions.h5 files and *_table.csv in the output folder (8_batch_objectclassification).
Now you have performed all steps in ilastik. You can close ilastik.

8.9 CSV Summary

Switch back to the caactus GUI. Scroll down to Data Analysis

data_analysis

The default Pixel Size is already set in Global Settings — you can leave it as-is for the sample data.
Find 7. CSV Summary and click Run.
Inspect the generated df_clean.csv. This spreadsheet combines all feature tables from Object Classification into one file for downstream analysis.

8.10 Summary Statistics

Find 8. Summary Statistics and click Run.
Inspect the generated results. The output generated will be
- a) df_summary_complete.csv = .csv-table containing also not usable category,
- b) df_refined_complete.csv = .csv-table without not usable category",
- c) counts.csv dataframe used in PlnModelling
- d) bar graph (barchart.png) (faceted by condition1 on x-axis, percent of morphotypes "Predicted Class" on the y-axis and condition2 as the facetting variable in rows.) You can play around by putting 'condition2' first and 'condition1' second to see how it changes the plot.
You may also change the colors: change the default in Global Settings

{'resting': '#FE6100', 'swollen': '#648FFF', 'germling': '#785EF0', 'hyphae': '#DC267F'}

{'resting': 'yellow', 'swollen': 'cyan', 'germling': 'blue', 'hyphae': 'magenta'}

Again, find 8. Summary Statistics and click Run. The colors now should be changed.
Similarly, you may change the morphotype names. Open df_clean.csv in a spreadsheet software (e.g. Excel). Replace all restingwith dormant (use Ctrl+F - Replace all, save df_clean.csv). Now re-do step 7.9 Summary Statistics.

Before you click Run, make sure you replace resting with dormantin both Class order

['resting', 'swollen', 'germling', 'hyphae']

and Color Mapping fields.

{'dormant': 'yellow', 'swollen': 'cyan', 'germling': 'blue', 'hyphae': 'magenta'}

Again, find 8. Summary Statistics and click Run. The names now should be changed.
Let's imagine you only have 3 cell categories in your dataset. Again, open df_clean.csv in a spreadsheet software (e.g. Excel). Replace all dormantwith spores (use Ctrl+F - Replace all, save df_clean.csv).

Similarly, replace all swollenwith spores (use Ctrl+F - Replace all, save df_clean.csv). Now change the Class Orderfield to

['spores', 'germling', 'hyphae']

and the Color Mappingfield to

{'spores': 'yellow', 'germling': 'blue', 'hyphae': 'magenta'}

Again, find 8. Summary Statistics and click Run. The names now should be changed.

8.11 PLN Modelling

Find 10. PLN Modelling and click Run.
Inspect the generated results in the subdirectory /sampledata_CD6_zenodo/9_data_analysis/ The output generated will be
- a) correlation_circle.png. Shows that PCA1, accounting for ~57% of the variance, primarily separated samples by condition2, whereas PCA2 accounted for ~25% of the variance based on condition1.
- b) pca_plot.png. The PCA plot shows how the images are grouped together in 2D-space based on combined category of condition1 and condition2 (the categorical levels will be combined).

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.4

Apr 24, 2026

0.3.3

Apr 23, 2026

0.3.2

Apr 7, 2026

This version

0.3.1

Apr 4, 2026

0.3.0

Mar 31, 2026

0.2.9

Mar 1, 2026

0.2.8

Feb 27, 2026

0.2.7

Feb 27, 2026

0.2.6

Feb 27, 2026

0.2.5

Feb 27, 2026

0.2.4

Feb 27, 2026

0.2.3

Feb 27, 2026

0.2.2

Feb 27, 2026

0.2.1

Feb 25, 2026

0.1.8

Nov 12, 2025

0.1.7

Nov 12, 2025

0.1.6

Nov 12, 2025

0.1.5

Nov 10, 2025

0.1.4

Oct 5, 2025

0.1.3

Jul 3, 2025

0.1.2

Jul 3, 2025

0.1.1

Jul 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

caactus-0.3.1.tar.gz (2.4 MB view details)

Uploaded Apr 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

caactus-0.3.1-py3-none-any.whl (2.4 MB view details)

Uploaded Apr 4, 2026 Python 3

File details

Details for the file caactus-0.3.1.tar.gz.

File metadata

Download URL: caactus-0.3.1.tar.gz
Upload date: Apr 4, 2026
Size: 2.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for caactus-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`d9d65c2cfc9d12c06ae19e1f8b14b63ae2fe0fcf966acd3b97114bd4b5e62046`
MD5	`eb082f439229720cd822fc8d043aebb8`
BLAKE2b-256	`a97ad604fc09e50da1c9166ccee658e76efa2ccf52b0e4c6bb41232acda5e743`

See more details on using hashes here.

File details

Details for the file caactus-0.3.1-py3-none-any.whl.

File metadata

Download URL: caactus-0.3.1-py3-none-any.whl
Upload date: Apr 4, 2026
Size: 2.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for caactus-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`88334f9edab23829557907bbaf2c5e053b424d606cb7c8a568d3089b696c5e44`
MD5	`bc3c22acb0624e965f7bf4a3c21d0001`
BLAKE2b-256	`b6ad69bb9ddfa77024f50b05d070ff26d79893d8c4196496d3870bfcf0b11289`

See more details on using hashes here.

caactus 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

caactus

Introduction

Installation

Install miniconda, create an environment and install Python and vigra

Install caactus

Install ilastik

Quick Overview of the workflow

Sample Dataset

Detailed Description of the Workflow

1. Culturing

2. Image acquisition

3. Data Preparation

3.1 Create Project Directory

3.2 Getting started

3.3 The Graphic User Interface (GUI)

3.3.1 Global Settings

3.3.2 Pre-Processing

3.3.3 ilastik

3.3.4 Data analysis

4. Training

4.1. Selection of Training Images and Conversion

4.1.1 Selection of Training data

4.1.2 Conversion

4.2. Pixel Classification

4.3 Boundary-based Segmentation with Multicut

4.4 Background Processing

4.5. Object Classification

5. Batch Processing

5.1 Rename Files

5.2 Conversion

5.3 Batch Processing Pixel Classification

5.4 Batch Processing Multicut Segmentation

5.5 Background Processing

5.6 Batch processing Object classification

6. Post-Processing and Data Analysis

6.1 Merging Data Tables and Table Export

6.2 Creating Summary Statistics

6.3 PLN Modelling

7. Running caactus from the command line

7.1 Setup config.toml-file

7.2 Conversion

7.3 Background Processing

7.4 Rename Files

7.5 CSV-Summary

7.6 Creating Summary Statistics

7.7 PLN Modelling

8. Tutorial

8.1 Download Sample Data

8.2 Global Settings

8.3 Pre-Processing - Renaming

8.4 Pre-Processing - Tif to h5

8.5 Batch Pixel Classification

8.6 Batch Processing Multicut Segmentation

8.7 Batch Background Processing

8.8 Batch Object Classification

8.9 CSV Summary

8.10 Summary Statistics

8.11 PLN Modelling

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata