Processing of Bacmman measurement tables

Project description

PyBerries

PyBerries is a Python package that can be used to import, manipulate and plot data from Bacmman measurement tables.

It relies mainly on Pandas for data handling and Seaborn/Matplotlib for plotting.

[[TOC]]

Installation

Anaconda (recommended)

Anaconda will install both Python and Jupyter-lab (used to run Python notebooks) easily. Note however that it requires ~5 Gb free disk space. For a lighter installation procedure, see the next section "Command line install".

Download Anaconda from the official website
Run the installer (leave all options as default)
Check git is installed on your computer
- Open a terminal (macOS/Linux) or Powershell (Windows)
- Enter the command git --version
- If an error is shown, download and install git from here (leave all install options as default)
- After installing, restart your terminal/powershell; the git --version command should display a version number (e.g. 2.40.0)
Start "Anaconda Navigator"
In Anaconda, launch the "Jupyter Lab" module

Command line install (advanced users)

Open a terminal (macOS/Linux) or Powershell (Windows)
Install Python
- Enter the command python --version
- If an error or a version < 3.8 is shown, download and install Python from the official website
Install git
- Enter the command git --version
- If an error is shown, download and install git from here (leave all install options as default)
After installing, restart your terminal/powershell; both of the above commands should display a version number
Install Jupyter Lab
- In a terminal/powershell, run the command python -m pip install jupyterlab
- After the installation completes, Jupyter Lab can be started using the command jupyter-lab

Using existing notebooks

Download the relevant notebooks from the Notebook folder (you will have to click on individual notebooks and click on the "download" button at the top-right).
Start Jupyter Lab
In the left panel of Jupyter lab, click on "Upload file" and select the notebook you have downloaded
- The notebook will appear in the list of files and folders
- Click on the notebook on the list to open it

A Python notebook consists of a mix of text and code cells.

Update the code where necessary (e.g. "Input" cell, plot options...)
Run individual code cells by clicking on them and pressing Shift + Enter
Once a dataset has been imported, you can run any cell from the "Figures" section (order is not important)
If you change plot options, re-run the corresponding cell to update the plot
When running your mouse over a plot, a "save" button should appear

Using the PyBerries package in your own code (advanced users)

To install the package, use the following command in a terminal:

python -m pip install PyBerries

Creating a DatasetPool

ðŸ“– DatasetPool documentation

To import Bacmman measurement tables with PyBerries, you must create a "DatasetPool" (an object that will contain one or several Bacmman datasets). The minimum required arguments to create a DatasetPool are:

dsList: name(s) of the Bacmman datasets to be imported
path: path to the Bacmman folder containing the datasets

Optional arguments can be added:

groups: set legend labels for the datasets. If two datasets have the same label, they will be concatenated (and error bars can be shown if supported)
- Format: groups = ['Group1', 'Group2', 'Group3'] with a number of groups equal to the number of datasets in dsList
filters: filter the datasets using the syntax of pandas.DataFrame.query
- Format: filters = {'object':'filter'}, where object is the name of the target Bacmman object
- Example: filters = {'Bacteria':'SpineLength > 3'} to keep only Bacteria that have a length > 3
- Note that filtering an object will also filter out any child objects (e.g. if a bacteria is removed, the spots it contains will be removed as well)
metadata: enter the name of a metadata field (found in the SourceImageMetadata folder of the dataset) to add a column with the corresponding metadata value for each position.
- Format: metadata = {'object':'metadata_name'} where object is the Bacmman object to which the metadata should be added
- Example: metadata = {'Bacteria':'DateTime'} will add the acquisition time for each position in the Bacteria table
preprocessing: a function to be applied to each measurement table before it is added to the dataset
- Format: preprocessing = {'object':function}
- Tip: lambda functions can be an easy way to perform simple tasks such as renaming a column: preprocessing = {'Bacteria':lambda df: df.rename(columns={'Old_name':'New_name'}}

Note: all arguments can either take a single value to be applied to all datasets, or one value per dataset in dsList.

Example: filters = {'Bacteria':['SpineLength > 3','']} will apply the cell length filter to the first, but not to the second dataset

Example of DatasetPool creation:

from pyberries.data import DatasetPool
data = DatasetPool(path=['D:/Daniel/BACMMAN/Timelapse'], dsList=['230118_DT23'], groups=[], metadata={'Bacteria':'DateTime'}, filters={}, preprocessing={})

About filtering

Filtering is applied when creating a DatasetPool, but can also be applied afterwards with the apply_filters method. Example:

data.apply_filters({'Bacteria':'SpineLength > 3'})

Data format

The Bacmman measurement tables will be imported, and tables from objects that have the same name will be concatenated as a single Pandas DataFrame. The Dataset column specifies which Bacmman dataset a given line belongs to.

Measurement tables are stored in a dictionary ({object_name:table}) under the table property.

For example, to display the data contained in the 'Bacteria' table, run in a Jupyter Notebook:

display(data.table['Bacteria'])

Dataset summary

You can use the describe method to print a summary of all numerical columns in the DatasetPool. One or several aggregation methods can be specified, for example:

data.describe('median')

to print the median value for each column, or

data.describe(['mean', 'std'])

to print mean and standard deviation.

Other aggregations are possible, including (but not limited to): 'max', 'min', 'sum', 'sem'. For more details on aggregations, consult pandas.DataFrame.aggregate.

Output can be limited to certain columns by using the keyword include:

data.describe(['mean', 'std'], include=['SpineLength', 'SpineWidth'])

Adding columns

The add_columns method allows adding predefined calculations (metrics) to the dataset. Current possible metrics are:

'heatmap'
'is_col_larger'
'bin_column'
'Dapp'

Example use:

data.add_columns(object_name='Spots', metrics=['Heatmap'])

Details on metrics can be found in DatasetPool.add_columns

Adding a column from a parent table

If 'Bacteria' is a parent of 'Spots', it is possible to add data from the parent table to the child's. For example if the 'Bacteria' table contains lineage information, we can add to each spot the lineage of its parent bacteria.

For example:

data.add_from_parent(object_name='Spots', col='lineage')

Note that the parent table will be automatically inferred from the Bacmman configuration file.

Timeseries data

If the metadata 'DateTime' has been included in the dataset, it is possible to perform a time-binning on the data in order to plot metrics at different time resolutions. This is done by using the method get_timeseries. The different timeseries metrics available are:

'SpineLength'
'ObjectCount'
'ObjectClass'
'Intensity'
'Quantile'
'Aggregation'
'Fluo_intensity'
'FOV_Positions'

The resulting dataframe is stored in the timeseries property of the dataset (can be shown by display(data.timeseries['Bacteria'])).

Example use:

timeseries_parameters = {'metric':'ObjectCount', # Metric to be plotted
                         'col':'SpotCount', # Column to be used from the source data
                         'timeBin':2, # Time interval in min
                         'thr':1 # For 'ObjectCount': threshold on number of objects to include in 'ObjectFrac' column
                        }
data.get_timeseries(object_name='Bacteria', **timeseries_parameters)

For more details on timeseries options, see DatasetPool.get_timeseries

Making figures

PyBerries uses Seaborn and Matplotlib to plot data. There are three different ways to create plots:

Through a DatasetPool method (plot_preset)
- This is the preferred method, since it will take care of properly displaying all plot elements for the given task
- Presets also include several plots which combine several elements (e.g. plot_timeseries which displays both a scatter and a lineplot)
By importing plots from pyberries.plots
- This allows a bit more flexibility, while still taking care of legend, axis labels, etc.
By importing plot functions from Seaborn
- This will give you the most flexibility, but will require a lot of manual fixing for plot limits, axis labels, legend,...

For more details on Seaborn, visit

DatasetPool plotting methods

The plot preset function takes the following arguments:

preset (str): type of plot to make
object_name (str): table to plot from
timeseries (bool): set to True to plot from a timeseries table, and to False (default) to plot from the normal measurement table
drop_duplicates_by (list of str): before plotting, remove all lines that are duplicates according to the column (or combination of columns) specified
return_axes (bool): return figure axis to enable further changes/additional plots to be added
title (str): plot title
xlabel, ylabel (str): X and Y axis labels
xlim, ylim (2-tuple): X and Y axis limits
**kwargs: any arguments to be passed to the seaborn plot

Available presets are:

histogram
bar
line
scatter
datapoints_and_mean
heatmap
timeseries
boxenplot
spot_tracks

Example use:

plot_args = {'x':'Bacteria_Size',
            'hue':'Group',
            'binwidth':2,
            'stat':'probability',
            'common_norm':False,
            'errorbars':None,
            'title':'',
            'xlabel':'Cell area (Âµm$^2$)',
            'ylabel':'Probability',
            'xlim':(None, None),
            'ylim':(None, None),
            'multiple':'layer',
            'element':'poly',
            'kde':False,
            'palette':'deep',
            }

data.plot_preset(preset='histogram', object_name='Bacteria', **plot_args)

The additional argument return_axes can be passed to all dataset plot methods to enable further modifications to the figure:

import seaborn as sns

ax = data.plot_preset(preset='histogram', object_name='Bacteria', return_axes=True, **plot_args)
sns.move_legend(ax, "upper left", bbox_to_anchor=(1, 1), labelspacing=1)

moves the legend outside of the plot.

Importing from pyberries.plots

Seaborn plots can be imported from pyberries.plots. The example above could then be written:

from pyberries.plot import histplot

plot_args = {'x':'Bacteria_Size',
            'hue':'Group',
            'binwidth':2,
            'stat':'probability',
            'common_norm':False,
            'title':'',
            'xlabel':'Cell area (Âµm$^2$)',
            'ylabel':'Probability',
            'xlim':(None, None),
            'ylim':(None, None),
            'multiple':'layer',
            'element':'poly',
            'kde':False,
            'palette':'deep',
            }

_,ax = plt.subplots(dpi=130)
ax = histplot(data.table['Bacteria'], ax=ax, **plot_args)

Note that histogram errorbars are only available when plotting through the dataset method.

Importing from Seaborn

When directly using Seaborn, the histogram above can be produced like this:

import seaborn as sns

plot_args = {'x':'Bacteria_Size',
            'hue':'Group',
            'binwidth':2,
            'stat':'probability',
            'common_norm':False,
            'multiple':'layer',
            'element':'poly',
            'kde':False,
            'palette':'deep',
            }

_,ax = plt.subplots(dpi=130)
g = sns.histplot(data=data.table['Bacteria'], **plot_args)
g.set(xlabel='Cell area (Âµm$^2$)', ylabel='Probability', title='Plot title', xlim=(None, None), ylim=(None, None))
if not g.get_legend() == None: g.get_legend().set_title("")

File utilities

Collection of functions to manipulate:

File names
- Zero-padding on numbers
- Replace a string by another
- Add a string to the end of all file names
Tiff files
- Make a tiff stack from single tiff files that have the same ending
- Update axis description in files metadata
- Make copies of a tiff file with am increasing ID number as suffix
Folders downloaded from Omero
- Move files from nested Omero folders to the same folder

Project details

Release history Release notifications | RSS feed

0.2.26

Dec 18, 2024

0.2.25

Dec 2, 2024

0.2.24

Nov 27, 2024

0.2.23

Nov 27, 2024

0.2.22

Oct 2, 2024

0.2.21

Apr 19, 2024

0.2.20

Apr 19, 2024

0.2.19

Feb 29, 2024

0.2.18

Sep 20, 2023

0.2.17

Sep 8, 2023

0.2.16

Sep 4, 2023

0.2.15

Jul 26, 2023

0.2.14

Jul 10, 2023

0.2.13

Jun 2, 2023

0.2.12

May 26, 2023

0.2.11

May 24, 2023

0.2.10

May 23, 2023

0.2.9

May 17, 2023

0.2.8.post1

May 10, 2023

0.2.8

May 10, 2023

0.2.7

May 5, 2023

0.2.6.post1

May 5, 2023

0.2.6

May 5, 2023

This version

0.2.5

May 4, 2023

0.2.3

May 4, 2023

0.2.2

May 3, 2023

0.2.1

May 3, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyBerries-0.2.5.tar.gz (28.8 kB view details)

Uploaded May 4, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

PyBerries-0.2.5-py3-none-any.whl (32.1 kB view details)

Uploaded May 4, 2023 Python 3

File details

Details for the file PyBerries-0.2.5.tar.gz.

File metadata

Download URL: PyBerries-0.2.5.tar.gz
Upload date: May 4, 2023
Size: 28.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for PyBerries-0.2.5.tar.gz
Algorithm	Hash digest
SHA256	`ebfacde8953eda41841100ae671d5a74ac8a7db7f457b7e71d9899f8f95f308b`
MD5	`a9ba984a4f7c363e15cd8650a74d1a8c`
BLAKE2b-256	`7aaa868f280ab3a2cfd7eeaab1001de8763e33ec05d60673a8db12ca2df67dc2`

See more details on using hashes here.

File details

Details for the file PyBerries-0.2.5-py3-none-any.whl.

File metadata

Download URL: PyBerries-0.2.5-py3-none-any.whl
Upload date: May 4, 2023
Size: 32.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for PyBerries-0.2.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`282fc490d7a20e5a3a8aa388a7cbf1b142c449e2026cdb53cd9b813132fe9dc5`
MD5	`6dca41f733d79d4a2421d330a837d7a7`
BLAKE2b-256	`40a07d3ec62ed0ba49f44e72128f9327a9fd61e781693e9892f510385f6e1a9c`

See more details on using hashes here.

PyBerries 0.2.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

PyBerries

Installation

Using existing notebooks

Using the PyBerries package in your own code (advanced users)

Creating a DatasetPool

About filtering

Data format

Dataset summary

Adding columns

Adding a column from a parent table

Timeseries data

Making figures

DatasetPool plotting methods

Importing from pyberries.plots

Importing from Seaborn

File utilities

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes