A Python library for SeaFlow data.
A Python package for SeaFlow flow cytometer data.
Table of Contents
- Read EVT/OPP/VCT Files
- Command-line Interface
- Integration with R
This package is compatible with Python 3.7 and 3.8.
This will clone the repo and create a new virtual environment
venv can be replaced with
git clone https://github.com/armbrustlab/seaflowpy cd seaflowpy [[ -d ~/venvs ]] || mkdir ~/venvs python3 -m venv ~/venvs/seaflowpy source ~/venvs/seaflowpy/bin/activate pip3 install -U pip setuptools wheel pip3 install -r requirements-test.txt pip3 install . # Confirm the seaflowpy command-line tool is accessible seaflowpy version # Make sure basic tests pass pytest # Leave the new virtual environment deactivate
pip3 install seaflowpy
Docker images are available from Docker Hub at
docker pull ctberthiaume/seaflowpy docker run -it ctberthiaume/seaflowpy seaflowpy version
The Docker build file is in this repo at
/Dockerfile. The build process for the Docker image is detailed in
Read EVT/OPP/VCT Files
All file reading functions will return a
pandas.DataFrame of particle data.
Gzipped EVT, OPP, or VCT files can be read if they end with a ".gz" extension.
For these code examples assume
seaflowpy has been imported as
pandas has been imported as
import pandas as pd import seaflowpy as sfp
*_filepath has been set to the correct data file.
Read an EVT file
evt = sfp.fileio.read_evt_labview(evt_filepath)
Read an OPP file as an Apache Arrow Parquet file, select the 50% quantile, and subset columns.
VCT files created with
popcycle are also standard Parquet files and can be read in a similar fashion.
opp = pd.read_parquet(opp_filepath) opp50 = opp[opp["q50"]] opp50 = opp50[['fsc_small', 'chl_small', 'pe']]
seaflowpy CLI tools are accessible from the
seaflowpy --help to begin exploring the CLI usage documentation.
SFL validation workflow
SFL validation sub-commands are available under the
seaflowpy sfl command.
The usage details for each command can be accessed as
seaflowpy sfl <cmd> -h.
The basic worfkflow should be
If starting with an SDS file, first convert to SFL with
If the SFL file is output from
sds2sflor is a raw SeaFlow SFL file, convert it to a normalized format with
seaflowpy sfl print. This command can be used to concatenate multiple SFL files, e.g. merge all SFL files in day-of-year directories.
Check for potential errors or warnings with
seaflowpy sfl validate.
Fix errors and warnings. Duplicate file errors can be fixed with
seaflowpy sfl dedup. Bad lat/lon errors may be fixed with
seaflowpy sfl convert-gga, assuming the bad coordinates are GGA to begin with. This can be checked with with
seaflowpy sfl detect-gga. Other errors or missing values may need to be fixed manually.
(Optional) Update event rates based on true event counts and file duration with
seaflowpy sfl fix-event-rate. True event counts for raw EVT files can be determined with
seaflowpy evt count. If filtering has already been performed then event counts can be pulled from the
all_countcolumn of the opp table in the SQLITE3 database. e.g.
sqlite3 -separator $'\t' SCOPE_14.db 'SELECT file, all_count ORDER BY file'
(Optional) As a check for dataset completeness, the list of files in an SFL file can be compared to the actual EVT files present with
seaflowpy sfl manifest. It's normal for a few files to differ, especially near midnight. If a large number of files are missing it may be a sign that the data transfer was incomplete or the SFL file is missing some days.
Once all errors or warnings have been fixed, do a final
seaflowpy validatebefore adding the SFL file to the appropriate repository.
seaflowpy sfl manifest AWS credentials need to be configured.
The easiest way to do this is to install the
awscli Python package
and go through configuration.
pip3 install awscli aws configure
This will store AWS configuration in
seaflowpy will use to
access Seaflow data in S3 storage.
Integration with R
seaflowpy from R, update the PATH environment variable in
~/.Renviron. For example:
pytest for testing. Tests can be run from this directory as
pytest to test the installed version of the package, or run
tox to install
the source into a temporary virtual environment for testing.
Source code structure
This project follows the Git feature branch workflow.
Active development happens on the
develop branch and on feature branches which are eventually merged into
To build source tarball, wheel, and Docker image, run
./build.sh. This will
seaflowpy-distwith source tarball and wheel file (created during Docker build)
Docker image named
To remove all build files, run
rm -rf ./seaflowpy-dist.
Updating requirements files
Create a new virtual environment
python3 -m venv newenv source newenv/bin/activate
Update pip, wheel, setuptools
pip3 install -U pip wheel setuptools
pip3 install .
Then freeze the requirements
pip3 freeze | grep -v seaflowpy >requirements.txt
Then install test dependencies, test, and freeze
pip3 install pytest pytest-benchmark pytest pip3 freeze | grep -v seaflowpy >requirements-test.txt
Then install dev dependencies, test, and freeze
pip3 install pylint twine pytest pip3 freeze | grep -v seaflowpy >requirements-dev.txt
Leave the virtual environment
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for seaflowpy-5.1.1-py3-none-any.whl