Skip to main content

Package of CellPLM: A pretrain-ed cell language model beyond single cells. Paper link: https://www.biorxiv.org/content/10.1101/2023.10.03.560734

Project description

CellPLM

This is the official codebase for CellPLM: Pre-training of Cell Language Model Beyond Single Cells. The paper has been accepted by ICLR 2024 conference.

Paper License

CellPLM is the first single-Cell Pre-trained Language Model that encodes cell-cell relations and it consistently outperforms existing pre-trained and non-pre-trained models in diverse downstream tasks, with 100x higher inference speed compared to existing pre-trained models. You can also find a brilliant blog about the idea of CellPLM here.

Installation

We recommend PyPI for quick installation. We recommend using python 3.9 and cuda>=11.7 but they are adjustable.

Quick Installation with PyPI

Make sure gpu version of pytorch (>=1.13.0) has been installed before installing CellPLM.

pip install cellplm

Full Installation (recommended for HPC users and developers)

conda create -n cellplm python=3.9 -y && conda activate cellplm
conda install cudatoolkit=11.7 -c pytorch -c nvidia
pip install -r requirements.txt

The full installation will install the same environment as we used during development. This includes rapids used to accelerate evaluation.

Tutorials

We offer several notebooks for various downstream tasks as introductory tutorials. Our latest studies demonstrate CellPLM is competitive on cell-type annotation tasks compared to other SOTA methods and pretrained models. The result table is shown below:

Method PBMC12K Pancreas HLCA Immune Brain Liver
SingleCellNet 0.845+-0.0064 0.644+-0.0006 0.811+-0.0046 0.775+-0.0009 0.877+-0.0033 0.872+-0.0023
ACTINN 0.614+-0.0709 0.528+-0.0926 0.218+-0.0440 0.236+-0.0300 0.695+-0.0624 0.614+-0.0349
scANVI 0.930+-0.0148 0.963+-0.0083 0.708+-0.0183 0.851+-0.0133 0.933+-0.0010 0.908+-0.0144
CellTypist 0.883+-0.0055 0.882+-0.0011 0.776+-0.0079 0.822+-0.0020 0.901+-0.0031 0.764+-0.0132
scDiff 0.967+-0.0042 0.968+-0.0143 0.893+-0.0070 0.844+-0.0076 0.947+-0.0074 0.844+-0.0042
scGPT 0.963 0.954 0.863 0.907 0.950 0.864
Geneformer 0.979 - 0.833 0.856 0.934 0.871
CellPLM 0.975 0.983 0.929 0.902 0.967 0.913

(The evaluation follows the setting in scDiff paper)

Pretrained CellPLM Model Checkpoints

The checkpoint can be acquired from our dropbox. We might update our checkpoints from time to time.

[10/10/2023] The latest version is 20230926_85M.

Citation

@article{wen2023cellplm,
  title={CellPLM: Pre-training of Cell Language Model Beyond Single Cells},
  author={Wen, Hongzhi and Tang, Wenzhuo and Dai, Xinnan and Ding, Jiayuan and Jin, Wei and Xie, Yuying and Tang, Jiliang},
  journal={bioRxiv},
  pages={2023--10},
  year={2023},
  publisher={Cold Spring Harbor Laboratory}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cellplm-0.1.0.post2.tar.gz (43.3 kB view details)

Uploaded Source

Built Distribution

cellplm-0.1.0.post2-py3-none-any.whl (58.2 kB view details)

Uploaded Python 3

File details

Details for the file cellplm-0.1.0.post2.tar.gz.

File metadata

  • Download URL: cellplm-0.1.0.post2.tar.gz
  • Upload date:
  • Size: 43.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.15

File hashes

Hashes for cellplm-0.1.0.post2.tar.gz
Algorithm Hash digest
SHA256 65dbf382bc99f7bdfc929b38303d55fe2be18962331af7528f9a99e4e0106d7b
MD5 0a2d6b85ae7373fcf5cb5c6647f46eaa
BLAKE2b-256 07fb39d2d3820b270e88c2d4da14182df0f04cc81f67dab6701f2d30d106a58f

See more details on using hashes here.

File details

Details for the file cellplm-0.1.0.post2-py3-none-any.whl.

File metadata

File hashes

Hashes for cellplm-0.1.0.post2-py3-none-any.whl
Algorithm Hash digest
SHA256 dbf64cdec707eae3be44746e3d7962951aa90307e97514ff964acbdac73a5a20
MD5 0a32bfce53cd2e517e384cb813902a4d
BLAKE2b-256 6072790db363ebd94d4ae491d0f47f7ec4b2ea05e0a3f07fb37dbec65edf15e7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page