Removes anomalous events from flow cytometry standard (FCS) files.
Project description
MetaClean3.0
MetaClean3.0 cleans/removes low-quality events in FCS (flow cytometry standard) data detected during irregular flow.
This method is developed for and funded by Metafora biosystem's MetaFlow platform.
Installation
Pre-requisites:
To install Python 3 on Linux:
metaclean3_public$ sudo apt install python3
metaclean3_public$ sudo apt install python3-pip
Install this package:
Install this package via PyPI:
metaclean3_public$ pip install metaclean3
OR install this package locally:
- Download this repository.
- Build and install the package; on Linux, go to the repository directory:
metaclean3_public$ python -m pip install pip-tools
metaclean3_public$ pip-compile pyproject.toml # generate requirements.txt
metaclean3_public$ pip-sync # install dependencies from generate requirements.txt
metaclean3_public$ pip install .
Usage
See API here.
MetaClean3.0 contains three classes, to of which you will directly interact with:
Once you have installed metaclean3
, in Python:
- Load your .fcs file using the fcsparser package. (Optional) We recommend that you compensate and transform your file before applying MetaClean3.0 for optimal results.
import fcsparser
fcs_file_local = '/path/to/fcs_file.fcs'
meta, data = fcsparser.parse(fcs_file_local, reformat_meta=False)
- Instantiate an FCSfile class
f
that will help format your .fcs file for MetaClean3.0. We strongly recommend users to compensate (and optionally transform) the .fcs file before applying MetaClean3.0.
from metaclean3 import FCSfile
from metaclean3.utils import (
get_timestep,
get_spillover_raw,
apply_compensation_matrix
)
# compensate .fcs file
sm = get_spillover_raw(meta=meta, dat_columns=data.columns)
if not (sm is None):
data[sm.columns] = apply_compensation_matrix(data[sm.columns], sm)
# instantiate FCSfile
f = FCSfile(data=data, time_step=get_timestep(meta))
- Run MetaClean3.0 on your file
f
.
from metaclean3 import MetaCleanFCS
d = MetaCleanFCS().apply(fcs=f)
If you wish, MetaClean3.0 can save cleaning process plots to a directory of your choice:
png_dir = '/path/to/save/png/plots/in'
d = MetaCleanFCS(png_dir=png_dir).apply(fcs=f)
The output pandas.DataFrame
, d
contains additional columns:
clean_keep
: The boolean column indicating which events to keep (True
) or remove (False
). This is the final result you are looking for!bin
: The bin label for each event.outlier_keep
: The boolean column indicating which events were deemed as outliers (False
).val_*
: The feature values extracted based on the chosen fluorescent measurement.
General-purpose cleaning
If you already have your way of extracting features and binning your .fcs data and you just want to execute the cleaning portion of MetaClean3.0 on a pandas.DataFrame
, binned_data
, you can do so directly with MetaClean
:
from metaclean3 import MetaClean
d = MetaClean().clean(data=binned_data) # all columns will be used.
Fine-tuning MetaClean3.0
For most cases, we recommend using default settings. However, if there are cases when you want to fine-tune results, some common arguments you can change are listed below. See attributes in the MetaCleanFCS
and MetaClean
for more details.
FCSfile
arguments control how events are binned
See FCSfile
API here.
min_bin_size
and max_bin_size
(default: 2000
and 10000
): The minimum and maximum number of bins allowed. If you have a large file with more than ten million events and you want MetaClean3.0 results to be more precise, you can increase max_bin_size
. Adjust with moderation.
f = FCSfile(data=data, time_step=get_timestep(meta), min_bin_size=2000, max_bin_size=10000)
MetaCleanFCS
arguments control feature selection and generation
See MetaCleanFCS
API here.
fluo_chans_no
(default: 4
): The number of fluorescent measurements to consider.
d = MetaCleanFCS(fluo_chans_no=4).apply(fcs=f)
fluo_chans_clean
(default: None
): If you already know what fluorescent measurements you want MetaClean3.0 to refer to when determining what events to keep, you can list them here as a string list. To see what fluorescent measurements are in your file, refer to your FCSfile
instance, f.fluo_chans
.
d = MetaCleanFCS(fluo_chans_clean=['FL1', 'FL2']).apply(fcs=f)
candidate_chans_type
(default: fluo
): The type of channels to use for cleaning i.e. fluo
(fluorescent), phys
(physical morphology), or all
.
d = MetaCleanFCS(candidate_chans_type=fluo).apply(fcs=f)
rm_outliers
(default: 'all'): MetaClean3.0 detects and removes outliers so they do not skew MetaClean3.0's judgement when removing low-quality events. If, in the final results, you do not want to remove all of these outliers, you can specify to keep them by setting rm_outliers
to 'all'
. If you want to keep some of the less outlying events, set this parameter to 'some' and if you want to remove all outliers, set this parameter to 'none'.
d = MetaCleanFCS(rm_outliers='all').apply(fcs=f)
n_cores
(default: -1
): The number of cores to use while calculating the density feature. Set to -1 to use all cores.
d = MetaCleanFCS(n_cores=-1).apply(fcs=f)
MetaClean
arguments control degree of leniency
See MetaClean
API here.
min_ref_percent
(default: 0.4
40%; range: [0, 1]
): The minimum percentage of bins they want MetaClean3.0 to keep.
d = MetaCleanFCS(min_ref_percent=0.4).apply(fcs=f)
min_ref_percent_to_keep
(default: 0.4
40%; range: [0, 1]
): If non-reference segments contain at least min_ref_percent_to_keep
bins, we keep it even if it differs from the longest reference segment.
d = MetaCleanFCS(min_ref_percent_to_keep=0.4).apply(fcs=f)
p_thres
(default: 0.05
; range: [0, 1]
): Adjacent event segments (along time) that do not have significantly different values (i.e. p-value > p_thres
) are merged. If p_thres
is small, MetaClean3.0 will remove fewer events (i.e. be more lenient).
d = MetaCleanFCS(p_thres=0.05).apply(fcs=f)
percent_shifts
(default: [0.15, 0.2, 0.25, 0.3, 0.35, 0.4]
; range: [0, 1]
): Adjacent event segments (along time) that have quantile values within a specified range of each other are merged. percent_diff=0.05
indicates that the quantiles to be tested are 5s% and 95%. percent_shifts=[0.15, 0.2]
indicates that MetaClean3.0 will test if the segments have quantile values within a 15% and 20% range of each other, whiever yeilds results that achieve a higher silhouette score. Small percent_shift
values mean that MetaClean3.0 will remove fewer events (i.e. be more lenient).
d = MetaCleanFCS(percent_shift=[0.15]).apply(fcs=f)
Edge cases
- Duplicate rows: if there are duplicate rows in input matrix
data
, the last column of these duplicate rows will be purturbed by a neglige-able value of around 1/50000. This is to prevent infinite values when calculating the density feature.- If you already deal with duplicate rows outside of MetaClean3.0, you can remove this step by setting
randomize_duplicates_tf
toFalse
to shorten runtime:
- If you already deal with duplicate rows outside of MetaClean3.0, you can remove this step by setting
d = MetaCleanFCS(percent_shift=[0.15]).apply(fcs=f, randomize_duplicates_tf=False)
API/Documentation
See API here.
This repository uses Sphinx to generate documentation contained in the docs
. To generate documentation locally on Linux:
- install Sphinx for Python:
metaclean3_public$ apt-get install python3-sphinx
metaclean3_public$ pip3 install -r docs/source/requirements.txt
metaclean3_public$ # sphinx-quickstart docs
- Generate the HTML documentation:
metaclean3_public$ sphinx-build -M html docs/source/ docs/build/
Now you can open the front page in your web browser: docs/build/html/index.html
.
Development
Unit testing: put test .fcs files in tests/data
before starting!!!
metaclean3_public$ python3 -m unittest discover tests
Build and compile package:
metaclean3_public$ python -m pip install pip-tools
metaclean3_public$ pip-compile --extra dev pyproject.toml # generate requirements.txt
metaclean3_public$ pip install -r requirements.txt # test installation of dependencies
metaclean3_public$ sudo python3 -m build # builds package into a zip
metaclean3_public$ twine check dist/* # check if package description renders on PyPI
Upload to TestPyPI:
metaclean3_public$ twine upload -r testpypi dist/* # upload
metaclean3_public$ python -m pip install -i https://test.pypi.org/simple metaclean3 # test install
Upload package to PyPI:
metaclean3_public$ twine upload dist/*
metaclean3_public$ pip install metaclean3 # test install
Support
See the issues section of this repository.
Citation
If this package helped you in any way please cite MetaClean3.0's paper:
(TBP)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file metaclean3-0.1.1.tar.gz
.
File metadata
- Download URL: metaclean3-0.1.1.tar.gz
- Upload date:
- Size: 40.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff24b92f7034d0ed1c6b447e0981dd814f8aef59eb95d3b6cd9f323c4467075a |
|
MD5 | a8facebec3be390ff0a4f2d618078cae |
|
BLAKE2b-256 | c6ce78bf2fa5892c39ff8908d695857287f6e07b10ee22c2257c31f62fa83076 |
File details
Details for the file metaclean3-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: metaclean3-0.1.1-py3-none-any.whl
- Upload date:
- Size: 39.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc2ba07e08c5181a6c46ffde24e7038cce10911c76ca399be174fc29c5dcec94 |
|
MD5 | 0a4acb7728b94a1e49dcbf7a57b02a2e |
|
BLAKE2b-256 | f4436315ea769655c6186a68fd28bc39535ff99f2a5d9e5459bd0ee9cf63cbf4 |