Processing of Bacmman measurement tables
Project description
PyBerries
PyBerries is a Python package that can be used to import, manipulate and plot data from Bacmman measurement tables.
It relies mainly on Pandas for data handling and Seaborn/Matplotlib for plotting.
[[TOC]]
Installation
Anaconda (recommended)
Anaconda will install both Python and Jupyter-lab (used to run Python notebooks) easily. Note however that it requires ~5 Gb free disk space. For a lighter installation procedure, see the next section "Command line install".
- Download Anaconda from the official website
- Run the installer (leave all options as default)
- Start "Anaconda Navigator"
- In Anaconda, launch the "Jupyter Lab" module
Command line install (advanced users)
- Open a terminal (macOS/Linux) or Powershell (Windows)
- Install Python
- Enter the command
python --version - If an error or a version < 3.9 is shown, download and install Python from the official website
- Enter the command
- After installing, restart your terminal/powershell; both of the above commands should display a version number
- Install Jupyter Lab
- In a terminal/powershell, run the command
python -m pip install jupyterlab - After the installation completes, Jupyter Lab can be started using the command
jupyter-lab
- In a terminal/powershell, run the command
Using existing notebooks
- Download the relevant notebooks from the Notebook folder (you will have to click on individual notebooks and click on the "download" button at the top-right).
- Start Jupyter Lab
- In the left panel of Jupyter lab, click on "Upload file" and select the notebook you have downloaded
- The notebook will appear in the list of files and folders
- Click on the notebook on the list to open it
A Python notebook consists of a mix of text and code cells.
- Update the code where necessary (e.g. "Input" cell, plot options...)
- Run individual code cells by clicking on them and pressing Shift + Enter
- Once a dataset has been imported, you can run any cell from the "Figures" section (order is not important)
- If you change plot options, re-run the corresponding cell to update the plot
- When running your mouse over a plot, a "save" button should appear
Using the PyBerries package in your own code (advanced users)
To install the package, use the following command in a terminal:
python -m pip install PyBerries
You can also install a specific version number (useful e.g. to make sure you code won't be broken by a future update):
python -m pip install PyBerries==0.2.6.post1
Creating a DatasetPool
📖 DatasetPool documentation
To import Bacmman measurement tables with PyBerries, you must create a "DatasetPool" (an object that will contain one or several Bacmman datasets). The minimum required arguments to create a DatasetPool are:
dsList: name(s) of the Bacmman datasets to be importedpath: path to the Bacmman folder containing the datasets
Optional arguments can be added:
groups: set legend labels for the datasets. If two datasets have the same label, they will be concatenated (and error bars can be shown if supported)- Format:
groups = ['Group1', 'Group2', 'Group3']with a number of groups equal to the number of datasets in dsList
- Format:
filters: filter the datasets using the syntax of pandas.DataFrame.query- Format:
filters = {'object':'filter'}, where object is the name of the target Bacmman object - Example:
filters = {'Bacteria':'SpineLength > 3'}to keep only Bacteria that have a length > 3 - Note that filtering an object will also filter out any child objects (e.g. if a bacteria is removed, the spots it contains will be removed as well)
- Format:
metadata: enter the name of a metadata field (found in theSourceImageMetadatafolder of the dataset) to add a column with the corresponding metadata value for each position.- Format:
metadata = {'object':'metadata_name'}where object is the Bacmman object to which the metadata should be added - Example:
metadata = {'Bacteria':'DateTime'}will add the acquisition time for each position in the Bacteria table
- Format:
preprocessing: a function to be applied to each measurement table before it is added to the dataset- Format:
preprocessing = {'object':function} - Tip: lambda functions can be an easy way to perform simple tasks such as renaming a column:
preprocessing = {'Bacteria':lambda df: df.rename(columns={'Old_name':'New_name'}}
- Format:
Note: all arguments can either take a single value to be applied to all datasets, or one value per dataset in dsList.
- Example:
filters = {'Bacteria':['SpineLength > 3','']}will apply the cell length filter to the first, but not to the second dataset
Example of DatasetPool creation:
from pyberries.data import DatasetPool
data = DatasetPool(path=['D:/Daniel/BACMMAN/Timelapse'], dsList=['230118_DT23'], groups=[], metadata={'Bacteria':'DateTime'}, filters={}, preprocessing={})
About filtering
Filtering is applied when creating a DatasetPool, but can also be applied afterwards with the apply_filters method. Example:
data.apply_filters({'Bacteria':'SpineLength > 3'})
Data format
The Bacmman measurement tables will be imported, and tables from objects that have the same name will be concatenated as a single Pandas DataFrame. The Dataset column specifies which Bacmman dataset a given line belongs to.
Measurement tables are stored in a dictionary ({object_name:table}) under the table property.
For example, to display the data contained in the 'Bacteria' table, run in a Jupyter Notebook:
display(data.table['Bacteria'])
Dataset summary
You can use the describe method to print a summary of all numerical columns in the DatasetPool. One or several aggregation methods can be specified, for example:
data.describe('median')
to print the median value for each column, or
data.describe(['mean', 'std'])
to print mean and standard deviation.
Other aggregations are possible, including (but not limited to): 'max', 'min', 'sum', 'sem'. For more details on aggregations, consult pandas.DataFrame.aggregate.
Output can be limited to certain columns by using the keyword include:
data.describe(['mean', 'std'], include=['SpineLength', 'SpineWidth'])
Adding columns
The add_columns method allows adding predefined calculations (metrics) to the dataset. Current possible metrics are:
'heatmap''is_col_larger''bin_column''Dapp'
Example use:
data.add_columns(object_name='Spots', metrics=['Heatmap'])
Details on metrics can be found in DatasetPool.add_columns
Adding a column from a parent table
If 'Bacteria' is a parent of 'Spots', it is possible to add data from the parent table to the child's. For example if the 'Bacteria' table contains lineage information, we can add to each spot the lineage of its parent bacteria.
For example:
data.add_from_parent(object_name='Spots', col='lineage')
Note that the parent table will be automatically inferred from the Bacmman configuration file.
Timeseries data
If the metadata 'DateTime' has been included in the dataset, it is possible to perform a time-binning on the data in order to plot metrics at different time resolutions. This is done by using the method get_timeseries. The different timeseries metrics available are:
'SpineLength''ObjectCount''ObjectClass''Intensity''Quantile''Aggregation''Fluo_intensity''FOV_Positions'
The resulting dataframe is stored in the timeseries property of the dataset (can be shown by display(data.timeseries['Bacteria'])).
Example use:
timeseries_parameters = {'metric':'ObjectCount', # Metric to be plotted
'col':'SpotCount', # Column to be used from the source data
'timeBin':2, # Time interval in min
'thr':1 # For 'ObjectCount': threshold on number of objects to include in 'ObjectFrac' column
}
data.get_timeseries(object_name='Bacteria', **timeseries_parameters)
For more details on timeseries options, see DatasetPool.get_timeseries
Making figures
PyBerries uses Seaborn and Matplotlib to plot data. There are three different ways to create plots:
- Through a DatasetPool method (
plot_preset)- This is the preferred method, since it will take care of properly displaying all plot elements for the given task
- Presets also include several plots which combine several elements (e.g.
plot_timeserieswhich displays both a scatter and a lineplot)
- By importing plots from pyberries.plots
- This allows a bit more flexibility, while still taking care of legend, axis labels, etc.
- By importing plot functions from Seaborn
- This will give you the most flexibility, but will require a lot of manual fixing for plot limits, axis labels, legend,...
For more details on Seaborn, visit
DatasetPool plotting methods
The plot preset function takes the following arguments:
- preset (str): type of plot to make
- object_name (str): table to plot from
- timeseries (bool): set to True to plot from a timeseries table, and to False (default) to plot from the normal measurement table
- drop_duplicates_by (list of str): before plotting, remove all lines that are duplicates according to the column (or combination of columns) specified
- return_axes (bool): return figure axis to enable further changes/additional plots to be added
- title (str): plot title
- xlabel, ylabel (str): X and Y axis labels
- xlim, ylim (2-tuple): X and Y axis limits
- **kwargs: any arguments to be passed to the seaborn plot
Available presets are:
histogrambarlinescatterdatapoints_and_meanheatmaptimeseriesboxenplotspot_tracks
Example use:
plot_args = {'x':'Bacteria_Size',
'hue':'Group',
'binwidth':2,
'stat':'probability',
'common_norm':False,
'errorbars':None,
'title':'',
'xlabel':'Cell area (µm$^2$)',
'ylabel':'Probability',
'xlim':(None, None),
'ylim':(None, None),
'multiple':'layer',
'element':'poly',
'kde':False,
'palette':'deep',
}
data.plot_preset(preset='histogram', object_name='Bacteria', **plot_args)
The additional argument return_axes can be passed to all dataset plot methods to enable further modifications to the figure:
import seaborn as sns
ax = data.plot_preset(preset='histogram', object_name='Bacteria', return_axes=True, **plot_args)
sns.move_legend(ax, "upper left", bbox_to_anchor=(1, 1), labelspacing=1)
moves the legend outside of the plot.
Importing from pyberries.plots
Seaborn plots can be imported from pyberries.plots. The example above could then be written:
from pyberries.plot import histplot
plot_args = {'x':'Bacteria_Size',
'hue':'Group',
'binwidth':2,
'stat':'probability',
'common_norm':False,
'title':'',
'xlabel':'Cell area (µm$^2$)',
'ylabel':'Probability',
'xlim':(None, None),
'ylim':(None, None),
'multiple':'layer',
'element':'poly',
'kde':False,
'palette':'deep',
}
_,ax = plt.subplots(dpi=130)
ax = histplot(data.table['Bacteria'], ax=ax, **plot_args)
Note that histogram errorbars are only available when plotting through the dataset method.
Importing from Seaborn
When directly using Seaborn, the histogram above can be produced like this:
import seaborn as sns
plot_args = {'x':'Bacteria_Size',
'hue':'Group',
'binwidth':2,
'stat':'probability',
'common_norm':False,
'multiple':'layer',
'element':'poly',
'kde':False,
'palette':'deep',
}
_,ax = plt.subplots(dpi=130)
g = sns.histplot(data=data.table['Bacteria'], **plot_args)
g.set(xlabel='Cell area (µm$^2$)', ylabel='Probability', title='Plot title', xlim=(None, None), ylim=(None, None))
if not g.get_legend() == None: g.get_legend().set_title("")
File utilities
Collection of functions to manipulate:
- File names
- Zero-padding on numbers
- Replace a string by another
- Add a string to the end of all file names
- Tiff files
- Make a tiff stack from single tiff files that have the same ending
- Update axis description in files metadata
- Make copies of a tiff file with am increasing ID number as suffix
- Folders downloaded from Omero
- Move files from nested Omero folders to the same folder
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file PyBerries-0.2.8.tar.gz.
File metadata
- Download URL: PyBerries-0.2.8.tar.gz
- Upload date:
- Size: 28.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a31aeeb29d15defe2ede1122ec1370ba9322f69e6ab06c8ac39d212ed1c6c3d5
|
|
| MD5 |
1b0c3f5c093c52084edc94b549e2d1ab
|
|
| BLAKE2b-256 |
031b505870facf356c09ec2154e97bc073a6f424b3a6ebf41d61a7c0aa3887c9
|
File details
Details for the file PyBerries-0.2.8-py3-none-any.whl.
File metadata
- Download URL: PyBerries-0.2.8-py3-none-any.whl
- Upload date:
- Size: 32.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eafe519d9d963eda084e8e3db54d2c41411686a961f92cb9a219a80fec0ea6bf
|
|
| MD5 |
36a6d73f8b112bb052e5269226f74279
|
|
| BLAKE2b-256 |
92e66ca1f32a0ac2aff96d6d8bf0251febc812e7462a7b30b326df5ccb826b86
|