Skip to main content

Implements *CellAnnotator (aka *CAT/starCAT), annotating scRNA-Seq with predefined gene expression programs

Project description

starCAT

Implements starCellAnnoTator (AKA starCAT), annotating scRNA-Seq with predefined gene expression programs

Citation

If you use starCAT, please cite our manuscript.

Installation

You can install starCAT and its dependencies via the Python Package Index.

pip install starcatpy

We tested it with scikit-learn 1.3.2, AnnData 0.9.2, and python 3.8. To run the tutorials, you also need jupyter or jupyterlab as well as scanpy and cnmf:

pip install jupyterlab scanpy cnmf

Basic starCAT usage

Please see our tutorials in python and R. A sample pipeline using a pre-built reference programs (TCAT.V1) is shown below.

# Load default TCAT reference from starCAT databse
tcat = starCAT(reference='TCAT.V1')

# tcat.ref.iloc[:5, :5]

#                     A1BG       AARD     AARSD1      ABCA1     ABCB1
# CellCycle-G2M   2.032614  22.965553  17.423538   3.478179  2.297279
# Translation    35.445282   0.000000   9.245893   0.477994  0.000000
# HLA            18.192997  14.632670   2.686475   3.937182  0.000000
# ISG             0.436212   0.000000  18.078197  17.354506  0.000000
# Mito           10.293049   0.000000  52.669895  14.615502  3.341488

# Load cell x genes counts data
adata = tcat.load_counts(datafn)

# Run starCAT
# expects the input data to be raw counts and to be stored in adata.X
# rather than adata.layers['counts']

usage, scores = tcat.fit_transform(adata)

usage.iloc[0:2, 0:4]
#                             CellCycle-G2M  Translation       HLA       ISG
# CATGCCTAGTCGATAA-1-gPlexA4       0.000039     0.001042  0.001223  0.000162
# AAGACCTGTAGCGTCC-1-gPlexC6       0.000246     0.100023  0.002991  0.042354

scores.iloc[0:2, :]
#                                  ASA  Proliferation  ASA_binary  \
# CATGCCTAGTCGATAA-1-gPlexA4  0.001556        0.00052       False   
# AAGACCTGTAGCGTCC-1-gPlexC6  0.012503        0.01191       False   

#                             Proliferation_binary Multinomial_Label  
# CATGCCTAGTCGATAA-1-gPlexA4                 False         CD8_TEMRA  
# AAGACCTGTAGCGTCC-1-gPlexC6                 False         CD4_Naive  

starCAT also can be run in the command line.

starcat --reference "TCAT.V1" --counts {counts_fn} --output-dir {output_dir} --name {outuput_name}
  • --reference - name of a default reference to download (ex. TCAT.V1) OR filepath containing a reference set of GEPs by genes (*.tsv/.csv/.txt), default is 'TCAT.V1'
  • --counts - filepath to input (cell x gene) counts matrix as a matrix market (.mtx.gz), tab delimited text file, or anndata file (.h5ad)
  • --scores - optional path to yaml file for calculating score add-ons, not necessary for pre-built references
  • --output-dir - the output directory. all output will be placed in {output-dir}/{name}...'. default directory is '.'
  • --name - the output analysis prefix name, default is 'starCAT'

For code to reproduce figures and analyses from our manuscript, please refer to the TCAT analysis Github.

Alternate implementation

For small datasets (smaller than ~50,000 cells or 700 MB), try running starCAT without installing any packages on our website.

Creating your own reference

We provide example scripts for constructing custom starCAT references from a single cNMF run or multiple cNMF runs.

Please let us know if you are interested in making your reference publically available for others to use analogous to our TCAT.V1 reference. You can email me at dkotliar@broadinstitute.org

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

starcatpy-1.0.10.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

starcatpy-1.0.10-py3-none-any.whl (17.4 kB view details)

Uploaded Python 3

File details

Details for the file starcatpy-1.0.10.tar.gz.

File metadata

  • Download URL: starcatpy-1.0.10.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for starcatpy-1.0.10.tar.gz
Algorithm Hash digest
SHA256 ff1b7e7a6d3e9432a7a8443bff810a44780bda188722388a6565ae09c03d4186
MD5 88cb12d60a772af2533b4b1adcfb74b2
BLAKE2b-256 268fb99f2a6e4d0d9596e3c2a9b0415ec564cfb9ac69e1ce1a32b2d9ca217045

See more details on using hashes here.

File details

Details for the file starcatpy-1.0.10-py3-none-any.whl.

File metadata

  • Download URL: starcatpy-1.0.10-py3-none-any.whl
  • Upload date:
  • Size: 17.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for starcatpy-1.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 311a11d152ad627d575f2fc87b9c154c65597c713fdeba721bdb3055b09ea182
MD5 fc2d7bffcafca7a5f63636d287d79147
BLAKE2b-256 3644a3cee010560a79fbe94ffa33857c2463acabd947647d87abd5c29bae0f46

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page