CyTOF Dimension Reduction Framework
Project description
CyTOF Dimension Reduction Framework
A framework of dimension reduction and its evaluation for both CyTOF and general-purpose usages.
Branch | Release | CI/CD | Documentation | Code Coverage |
---|---|---|---|---|
dev |
About
CytofDR is a framework of dimension reduction (DR) and its evaluation for both Cytometry by Time-of-Flight (CyTOF) and general-purpose usages. It allows you to conveniently run many different DRs at one place and then evaluate them to pick your embedding using our extensive evaluation framework! We aim to provide you with a reliable, extensible, and convenient interface for all your DR needs for both data analyses and future research!
Key Resources
- For detailed benchmarks and methodology explanations, please check out our paper in Nature Communications!
- For an online version of interactive results, please checkout CytofDR Playground.
- For documentation, please visit our free and detailed documentation page.
Installation
You can install our CytofDR package, which is currentl on PyPI
:
pip install CytofDR
Python (>=3.7) is required. This pacackage is architecture agnostic: it should run where PyPI or conda is available. All dependencies should be automatically installed. For a list of optional dependencies, please visit our documentation page's detailed Installation Guide.
Intallation should take less than a few minutes for most computers with reasonable network connections.
Conda Installation
I personally recommend using conda
to install everything since it's so easy to work with virtual environments. If you need help on how to get conda
installed in the first place, take a look here.
To install the package with conda
:
conda install -c kevin931 cytofdr -c conda-forge -c bioconda
The core dependencies should automatically install!
Dependencies
Our dependencies are broken down core dependencies and optional dependencies. Below is a list of core dependencies:
- scikit-learn
- numpy
- scipy
- umap-learn
- openTSNE
- phate
- annoy
- matplotlib
- seaborn
The most current compatible versions will work with CytofDR
, except for numpy
. New versions of numpy
can cause issues with conda
. If you wish to use PyCytoData
, you need to install numpy
version 1.20 or 1.21.
We also have some optional dependencies which are much trickier to install and manage. Refer to our Installation Guide for more details.
PyCytoData Integration
CytofDR
is a member of the PyCytoData Alliance Plus, meaning that we're compatible with the PyCytoData
package. The PyCytoData
package is used mainly for loading datasets and managing every step of the CyTOF workflow. By creating and maintaining this ecosystem, we hope to create a robust workflow as a one-stop solution for CyTOF practioners using Python. To install PyCytoData
, you can simply use the following command:
pip install PyCytoData
To view how you can perform DR using PyCYtoData
, this tutorial walks through every step.
Quick Tutorial
CytofDR
makes it easy to run many DR methods while also evaluating them for your CyTOF samples. We have a greatly simplified pipeline for your needs. To get started, follow this example:
>>> import numpy as np
>>> from CytofDR import dr
# Load Dataset
>>> expression = np.loadtxt(fname="PATH_To_file", dtype=float, skiprows=1, delimiter=",")
# Run DR and evaluate
>>> results = dr.run_dr_methods(expression, methods=["umap", "pca"])
Running PCA
Running UMAP
>>> results.evaluate(category = ["global", "local", "downstream"])
Evaluating global...
Evaluating local...
Evaluating downstream...
>>> results.rank_dr_methods()
{'PCA': 1.0, 'UMAP': 2.0}
# Save Results
>>> results.save_all_reductions(save_dir="PATH_to_DIR", delimiter=",")
>>> results.save_evaluations(path="PATH_to_FILE")
We strive to make our pipeline as simple as possible with natural langauge-like method names. Depending on your dataset size, the above example's runtime may vary. PCA is extremely fast, whereas can take upwards of 10 minutes if the dataset is much larger than 100,000 cells. For the evaluate
command, the downstream command's silhouette score and clustering step can take some time, but for a small dataset, it can accomplish evaluation within a few minutes.
For large dataset, we recommend using efficient DR methods and providing your own clustering algorithm if possible.
Example Dataset
We have included an example dataset generated by cytomulate
in the /example
folder. The data is an artificial data with 1000 cells to mimic real CyTOF data. To use the dataset, you can subsitute PATH_to_file
with the path to the example dataset exprs.txt
, which is in the expression matrix format.
Examples using PyCytoData
You can use PyCytoData
to load your dataset:
>>> from CytofDR import dr
>>> from PyCytoData import FileIO
# Load Dataset
>>> dataset = FileIO.load_expression("PATH_To_file", col_names = True)
# Run DR and evaluate
>>> results = dr.run_dr_methods(dataset.expression_matrix, methods=["umap", "pca"])
Running PCA
Running UMAP
Or with a benchmark dataset:
>>> from CytofDR import dr
>>> from PyCytoData import DataLoader
# Load Dataset
>>> dataset = DataLoader.load_dataset(dataset = "levine13")
# Run DR and evaluate
>>> results = dr.run_dr_methods(dataset.expression_matrix, methods=["umap", "pca"])
Running PCA
Running UMAP
All subsequent workflows remain the same.
Documentation
Of course, there are many more customizations and ways you can use CytofDR
. So, for detailed tutorials and other guides, we suggest that you vists our Official Documentation.
There you will find ways to install our package and get started! Also, we offer tutorials on customizations, working with DR methods, and finally our detailed evaluation framework. We hope that you can find what you need over there!
Latest Release: v0.3.1
This is a minor maintenance update of v0.3.x with updated references and documentation.
Changes and New Features
- Updated referneces and citation information in all relavent documentaion pages
- Removed a warning on SAUCIE's installation documentation
Improvements
- Update-to-date documentation and references
Deprecations
- (Since v0.2.0) The
comparison_classes
parameter of theEvaluationMetrics.embedding_concordance
method will no longer acceptstr
input.
Issues and Contributions
If you run into issues or have questions, feel free to open an issue here. I'd love to help you out! We also welcome any contributions, but you may want to also look our contribution guide. Even if you just have an idea, that'll be great!
References
Our preprint "Comparative Analysis of Dimension Reductions Methods for Cytometry by Time-of-Flight Data" is on bioRxiv and can be accessed right here. If you use our package in your research or deployment, a citation of our paper is highly appreciated:
@article{wang2023comparative,
title={Comparative analysis of dimension reduction methods for cytometry by time-of-flight data},
author={Wang, Kaiwen and Yang, Yuqiu and Wu, Fangjiang and Song, Bing and Wang, Xinlei and Wang, Tao},
journal={Nature Communications},
volume={14},
number={1},
pages={1--18},
year={2023},
publisher={Nature Publishing Group}
}
For a list of references of the methods, metrics, etc. used in the package, please visit our References and bibliography of our paper.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file CytofDR-0.3.1.tar.gz
.
File metadata
- Download URL: CytofDR-0.3.1.tar.gz
- Upload date:
- Size: 27.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.11.3 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9de874eebe1a51cc630a59c2bd1e01b019a6bcbed291b35734d87fed9f364fb9 |
|
MD5 | b73dcb5bde2f59854b07add9559a6817 |
|
BLAKE2b-256 | d53542c527eaa84feb788131a728a88473af3a5f7a151122aac739aefa533870 |
File details
Details for the file CytofDR-0.3.1-py3-none-any.whl
.
File metadata
- Download URL: CytofDR-0.3.1-py3-none-any.whl
- Upload date:
- Size: 27.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.11.3 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c9cfe27b8332da0e33e645a5f25af58ae7dd43f22cfc4fc79c6120a0b4098b18 |
|
MD5 | 0831323a1f40e0d8475eb630698d84dd |
|
BLAKE2b-256 | 1faa6e94d14ddffe5606dd19a5071ffc2abb1410ca4ec5fdfe5d26debf750407 |