Utilities for parsing files in a directory based on a file name pattern.
Project description
Filepattern Utility
Functions and a class to easily filename patterns for WIPP.
Documentation
Install
pip install filepattern
FilePattern Class
A number of functions are included in filepattern.py
, but some contain
complex output values that may be difficult to handle in an abstract way. To
simplify things, the FilePattern
class was created. The usage of the
FilePattern
class will be described here, but if finer control over filename
parsing is needed, detailed descriptions of each function is provided in
filepatter.py
.
The two methods implemented by FilePattern
are get_matching
and iterate
.
The get_matching
function gets all filenames matching a specific set of
filename values, while the iterate
function is an iterable that iterates over
every combination of filename values, which options for returning groups of
filenames according to a particular variable.
File Pattern Format
A file pattern is a string that follows the formatting of the
MIST plugin.
It is similar to a
regular expression,
and regular expression values may be included in the file pattern. However, the
file pattern string includes variables fields that are surround by curly
brackets, {}
, and the width of the parsed variables is indicated. For example,
if there is a set of files with names:
image_c000_z000.ome.tif
image_c000_z001.ome.tif
image_c000_z002.ome.tif
image_c001_z000.ome.tif
image_c001_z001.ome.tif
image_c001_z002.ome.tif
Then the filename pattern that indicates a c-variable and a z-variable would be
image_c{ccc}_z{zzz}.ome.tif
. Note that the width of each variable is indicated
by repeating the variable (for the c-variable, a width of 3 is indicated by
ccc
).
NOTE: The only possible variables are this time are x
, y
, p
, z
, c
,
t
, and r
. Further, only x
and/or y
may be defined or p
may be defined,
but if p
is defined when either x
or y
is defined, then an error will be
thrown. This will likely change in the future as more complex data sets will
need to be processed.
FilePattern Initialization
The FilePattern
class is initiated using file_path
(a folder path),
pattern
(a file pattern as described above), and an optional var_order
that
describes how files are sorted internally. In general, var_order
shouldn't
need to be set since the object methods can handle most file organization
issues. If needed, var_order
must be input as a string of variables that will
be contained in the internal file organizational strcuture. An example
var_order
would be xyzctr
.
FilePattern.get_matching
This function retrieves all files that match specific variables values. Using
the example filenames presented in the File Pattern Format section, if C=0
is passed as an input argument, then get_matching
will return all files that
contain _c000
. However, the input values do not need to be a single value,
they can be a list. So if C=[0,1]
, then a list of all files will be returned
such that each file would contain _c000
or _c001
.
The list that is returned contains dictionaries. Each dictionary contains a key
for each variable parsed from the filename and a file
key that indicates the
name of the file.
FilePattern.iterate
This function is an iterable that returns a list of filenames every time it is
called. Each call returns a list of files that match a unique combination of
variable values so that every image that matches a file pattern is contained in
only one of the calls to this function. Specifying a list of variables in the
group_by
parameter will return a list of filenames with all variable values
constant except those indicated by group_by
. Using the example filenames
presented in the File Pattern Format section, if group_by='z'
then the top
three files are returned by the first call to this function and the bottom three
files are returned by the second call to this function.
In addition to the group_by
argument, it is possible to pass arguments
matching the get_matching
function. This will cause iterate
to only return
files matching specific variables values.
Examples
Simple iterator for all tiled tiff images in an input directory
Although it is probably overkill, a simple way to iterate over all images with a tiled tiff extension is:
file_path = "/path/to/files"
extension = '.ome.tif'
pattern = ".*" + extension
files = FilePattern(file_path,pattern)
for f_list in files.iterate():
print(f_list['file'])
Stack z-slices
In some cases, a microscope will export each image in a z-stack as a separate image file. If a microscope takes an image at three z-positions in each well, and images 4 wells, assume the output files are:
image_x000_y000_z000.ome.tif
image_x000_y000_z001.ome.tif
image_x000_y000_z002.ome.tif
image_x000_y001_z000.ome.tif
image_x000_y001_z001.ome.tif
image_x000_y001_z002.ome.tif
image_x001_y000_z000.ome.tif
image_x001_y000_z001.ome.tif
image_x001_y000_z002.ome.tif
image_x001_y001_z000.ome.tif
image_x001_y001_z001.ome.tif
image_x001_y001_z002.ome.tif
To start, initialize the FilePattern
object:
file_path = "/path/to/files"
pattern = "image_x{xxx}_y{yyy}_z{zzz}.ome.tif"
fp = FilePattern(file_path,pattern)
To get all of the z-slices for position x=1,y=0:
z_slices = fp.get_matching(X=1,Y=0)
print([f['file'] for f in z_slices]) # print the path for each file returned
Output:
image_x001_y000_z000.ome.tif
image_x001_y000_z001.ome.tif
image_x001_y000_z002.ome.tif
To loop through each position doing the same thing:
for f in fp.iterate(group_by='z'):
print('Files for (x,y): ({},{})'.format(f[0]['x'],f[0]['y']))
print([f['file'] for f in z_slices]) # print the path for each file returned
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for filepattern-1.4.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d027d67266f26f4b9bff14dee25208c6e71e52de308b5495bb91ad8b1a732585 |
|
MD5 | d9b85a7e7418b9c3036da499626841ec |
|
BLAKE2b-256 | 76dce6cd19a2075d3dcaeeec6fe39dc97159d59ec7d34e6ddae9cd059e54b622 |