Skip to main content

Processing of Bacmman measurement tables

Project description

PyBerries

PyBerries is a Python package that can be used to import, manipulate and plot data from Bacmman measurement tables.

It relies mainly on Pandas for data handling and Seaborn/Matplotlib for plotting.

[[TOC]]


Installation

Anaconda (recommended)

Anaconda will install both Python and Jupyter-lab (used to run Python notebooks) easily. Note however that it requires ~5 Gb free disk space. For a lighter installation procedure, see the next section "Command line install".

  • Download Anaconda from the official website
  • Run the installer (leave all options as default)
  • Check git is installed on your computer
    • Open a terminal (macOS/Linux) or Powershell (Windows)
    • Enter the command git --version
    • If an error is shown, download and install git from here (leave all install options as default)
    • After installing, restart your terminal/powershell; the git --version command should display a version number (e.g. 2.40.0)
  • Start "Anaconda Navigator"
  • In Anaconda, launch the "Jupyter Lab" module
Command line install (advanced users)
  • Open a terminal (macOS/Linux) or Powershell (Windows)
  • Install Python
    • Enter the command python --version
    • If an error or a version < 3.8 is shown, download and install Python from the official website
  • Install git
    • Enter the command git --version
    • If an error is shown, download and install git from here (leave all install options as default)
  • After installing, restart your terminal/powershell; both of the above commands should display a version number
  • Install Jupyter Lab
    • In a terminal/powershell, run the command python -m pip install jupyterlab
    • After the installation completes, Jupyter Lab can be started using the command jupyter-lab

Using existing notebooks

  • Download the relevant notebooks from the Notebook folder (you will have to click on individual notebooks and click on the "download" button at the top-right).
  • Start Jupyter Lab
  • In the left panel of Jupyter lab, click on "Upload file" and select the notebook you have downloaded
    • The notebook will appear in the list of files and folders
    • Click on the notebook on the list to open it

A Python notebook consists of a mix of text and code cells.

  • Update the code where necessary (e.g. "Input" cell, plot options...)
  • Run individual code cells by clicking on them and pressing Shift + Enter
  • Once a dataset has been imported, you can run any cell from the "Figures" section (order is not important)
  • If you change plot options, re-run the corresponding cell to update the plot
  • When running your mouse over a plot, a "save" button should appear

Using the PyBerries package in your own code (advanced users)

To install the package, use the following command in a terminal:

python -m pip install PyBerries

Creating a DatasetPool

📖 DatasetPool documentation

To import Bacmman measurement tables with PyBerries, you must create a "DatasetPool" (an object that will contain one or several Bacmman datasets). The minimum required arguments to create a DatasetPool are:

  • dsList: name(s) of the Bacmman datasets to be imported
  • path: path to the Bacmman folder containing the datasets

Optional arguments can be added:

  • groups: set legend labels for the datasets. If two datasets have the same label, they will be concatenated (and error bars can be shown if supported)
    • Format: groups = ['Group1', 'Group2', 'Group3'] with a number of groups equal to the number of datasets in dsList
  • filters: filter the datasets using the syntax of pandas.DataFrame.query
    • Format: filters = {'object':'filter'}, where object is the name of the target Bacmman object
    • Example: filters = {'Bacteria':'SpineLength > 3'} to keep only Bacteria that have a length > 3
    • Note that filtering an object will also filter out any child objects (e.g. if a bacteria is removed, the spots it contains will be removed as well)
  • metadata: enter the name of a metadata field (found in the SourceImageMetadata folder of the dataset) to add a column with the corresponding metadata value for each position.
    • Format: metadata = {'object':'metadata_name'} where object is the Bacmman object to which the metadata should be added
    • Example: metadata = {'Bacteria':'DateTime'} will add the acquisition time for each position in the Bacteria table
  • preprocessing: a function to be applied to each measurement table before it is added to the dataset
    • Format: preprocessing = {'object':function}
    • Tip: lambda functions can be an easy way to perform simple tasks such as renaming a column: preprocessing = {'Bacteria':lambda df: df.rename(columns={'Old_name':'New_name'}}

Note: all arguments can either take a single value to be applied to all datasets, or one value per dataset in dsList.

  • Example: filters = {'Bacteria':['SpineLength > 3','']} will apply the cell length filter to the first, but not to the second dataset

Example of DatasetPool creation:

from pyberries.data import DatasetPool
data = DatasetPool(path=['D:/Daniel/BACMMAN/Timelapse'], dsList=['230118_DT23'], groups=[], metadata={'Bacteria':'DateTime'}, filters={}, preprocessing={})

About filtering

Filtering is applied when creating a DatasetPool, but can also be applied afterwards with the apply_filters method. Example:

data.apply_filters({'Bacteria':'SpineLength > 3'})

Data format

The Bacmman measurement tables will be imported, and tables from objects that have the same name will be concatenated as a single Pandas DataFrame. The Dataset column specifies which Bacmman dataset a given line belongs to.

Measurement tables are stored in a dictionary ({object_name:table}) under the table property.

For example, to display the data contained in the 'Bacteria' table, run in a Jupyter Notebook:

display(data.table['Bacteria'])

Dataset summary

You can use the describe method to print a summary of all numerical columns in the DatasetPool. One or several aggregation methods can be specified, for example:

data.describe('median')

to print the median value for each column, or

data.describe(['mean', 'std'])

to print mean and standard deviation.

Other aggregations are possible, including (but not limited to): 'max', 'min', 'sum', 'sem'. For more details on aggregations, consult pandas.DataFrame.aggregate.

Output can be limited to certain columns by using the keyword include:

data.describe(['mean', 'std'], include=['SpineLength', 'SpineWidth'])

Adding columns

The add_columns method allows adding predefined calculations (metrics) to the dataset. Current possible metrics are:

  • 'heatmap'
  • 'is_col_larger'
  • 'bin_column'
  • 'Dapp'

Example use:

data.add_columns(object_name='Spots', metrics=['Heatmap'])

Details on metrics can be found in DatasetPool.add_columns

Adding a column from a parent table

If 'Bacteria' is a parent of 'Spots', it is possible to add data from the parent table to the child's. For example if the 'Bacteria' table contains lineage information, we can add to each spot the lineage of its parent bacteria.

For example:

data.add_from_parent(object_name='Spots', col='lineage')

Note that the parent table will be automatically inferred from the Bacmman configuration file.

Timeseries data

If the metadata 'DateTime' has been included in the dataset, it is possible to perform a time-binning on the data in order to plot metrics at different time resolutions. This is done by using the method get_timeseries. The different timeseries metrics available are:

  • 'SpineLength'
  • 'ObjectCount'
  • 'ObjectClass'
  • 'Intensity'
  • 'Quantile'
  • 'Aggregation'
  • 'Fluo_intensity'
  • 'FOV_Positions'

The resulting dataframe is stored in the timeseries property of the dataset (can be shown by display(data.timeseries['Bacteria'])).

Example use:

timeseries_parameters = {'metric':'ObjectCount', # Metric to be plotted
                         'col':'SpotCount', # Column to be used from the source data
                         'timeBin':2, # Time interval in min
                         'thr':1 # For 'ObjectCount': threshold on number of objects to include in 'ObjectFrac' column
                        }
data.get_timeseries(object_name='Bacteria', **timeseries_parameters)

For more details on timeseries options, see DatasetPool.get_timeseries

Making figures

PyBerries uses Seaborn and Matplotlib to plot data. There are three different ways to create plots:

  • Through a DatasetPool method (plot_preset)
    • This is the preferred method, since it will take care of properly displaying all plot elements for the given task
    • Presets also include several plots which combine several elements (e.g. plot_timeseries which displays both a scatter and a lineplot)
  • By importing plots from pyberries.plots
    • This allows a bit more flexibility, while still taking care of legend, axis labels, etc.
  • By importing plot functions from Seaborn
    • This will give you the most flexibility, but will require a lot of manual fixing for plot limits, axis labels, legend,...

For more details on Seaborn, visit

DatasetPool plotting methods

The plot preset function takes the following arguments:

  • preset (str): type of plot to make
  • object_name (str): table to plot from
  • timeseries (bool): set to True to plot from a timeseries table, and to False (default) to plot from the normal measurement table
  • drop_duplicates_by (list of str): before plotting, remove all lines that are duplicates according to the column (or combination of columns) specified
  • return_axes (bool): return figure axis to enable further changes/additional plots to be added
  • title (str): plot title
  • xlabel, ylabel (str): X and Y axis labels
  • xlim, ylim (2-tuple): X and Y axis limits
  • **kwargs: any arguments to be passed to the seaborn plot

Available presets are:

  • histogram
  • bar
  • line
  • scatter
  • datapoints_and_mean
  • heatmap
  • timeseries
  • boxenplot
  • spot_tracks

Example use:

plot_args = {'x':'Bacteria_Size',
            'hue':'Group',
            'binwidth':2,
            'stat':'probability',
            'common_norm':False,
            'errorbars':None,
            'title':'',
            'xlabel':'Cell area (µm$^2$)',
            'ylabel':'Probability',
            'xlim':(None, None),
            'ylim':(None, None),
            'multiple':'layer',
            'element':'poly',
            'kde':False,
            'palette':'deep',
            }

data.plot_preset(preset='histogram', object_name='Bacteria', **plot_args)

The additional argument return_axes can be passed to all dataset plot methods to enable further modifications to the figure:

import seaborn as sns

ax = data.plot_preset(preset='histogram', object_name='Bacteria', return_axes=True, **plot_args)
sns.move_legend(ax, "upper left", bbox_to_anchor=(1, 1), labelspacing=1)

moves the legend outside of the plot.

Importing from pyberries.plots

Seaborn plots can be imported from pyberries.plots. The example above could then be written:

from pyberries.plot import histplot

plot_args = {'x':'Bacteria_Size',
            'hue':'Group',
            'binwidth':2,
            'stat':'probability',
            'common_norm':False,
            'title':'',
            'xlabel':'Cell area (µm$^2$)',
            'ylabel':'Probability',
            'xlim':(None, None),
            'ylim':(None, None),
            'multiple':'layer',
            'element':'poly',
            'kde':False,
            'palette':'deep',
            }

_,ax = plt.subplots(dpi=130)
ax = histplot(data.table['Bacteria'], ax=ax, **plot_args)

Note that histogram errorbars are only available when plotting through the dataset method.

Importing from Seaborn

When directly using Seaborn, the histogram above can be produced like this:

import seaborn as sns

plot_args = {'x':'Bacteria_Size',
            'hue':'Group',
            'binwidth':2,
            'stat':'probability',
            'common_norm':False,
            'multiple':'layer',
            'element':'poly',
            'kde':False,
            'palette':'deep',
            }

_,ax = plt.subplots(dpi=130)
g = sns.histplot(data=data.table['Bacteria'], **plot_args)
g.set(xlabel='Cell area (µm$^2$)', ylabel='Probability', title='Plot title', xlim=(None, None), ylim=(None, None))
if not g.get_legend() == None: g.get_legend().set_title("")

File utilities

Collection of functions to manipulate:

  • File names
    • Zero-padding on numbers
    • Replace a string by another
    • Add a string to the end of all file names
  • Tiff files
    • Make a tiff stack from single tiff files that have the same ending
    • Update axis description in files metadata
    • Make copies of a tiff file with am increasing ID number as suffix
  • Folders downloaded from Omero
    • Move files from nested Omero folders to the same folder

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyBerries-0.2.5.tar.gz (28.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

PyBerries-0.2.5-py3-none-any.whl (32.1 kB view details)

Uploaded Python 3

File details

Details for the file PyBerries-0.2.5.tar.gz.

File metadata

  • Download URL: PyBerries-0.2.5.tar.gz
  • Upload date:
  • Size: 28.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for PyBerries-0.2.5.tar.gz
Algorithm Hash digest
SHA256 ebfacde8953eda41841100ae671d5a74ac8a7db7f457b7e71d9899f8f95f308b
MD5 a9ba984a4f7c363e15cd8650a74d1a8c
BLAKE2b-256 7aaa868f280ab3a2cfd7eeaab1001de8763e33ec05d60673a8db12ca2df67dc2

See more details on using hashes here.

File details

Details for the file PyBerries-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: PyBerries-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 32.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for PyBerries-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 282fc490d7a20e5a3a8aa388a7cbf1b142c449e2026cdb53cd9b813132fe9dc5
MD5 6dca41f733d79d4a2421d330a837d7a7
BLAKE2b-256 40a07d3ec62ed0ba49f44e72128f9327a9fd61e781693e9892f510385f6e1a9c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page