Skip to main content

scfind is a method for searching specific cell types from large single-cell datasets by a query of gene list. scfind can suggest subqueries score by TF-IDF method. scfind can perform hypergeometric test which allows the evaluation of marker genes specific to each cell type within a dataset.

Project description

scfind

scfind - Fast searches of large collections of single cell data

Single-cell technologies have enabled the profiling of millions of cells. However, for these vast resources to be fully leveraged, they must be easily queryable and accessible. To facilitate interactive and intuitive access to single-cell data, we have developed scfind, a search engine for cell atlases. Scfind can be utilized to evaluate marker genes, perform in silico gating, and identify both cell-type specific and housekeeping genes. An interactive interface with access to nine single-cell datasets is available at scfind.sanger.ac.uk.

Installation

scfind is available as a package for Python and R, with C++ extensions. The scfind R library is accessible at our GitHub repository. Before installing the scfind Python package, the Armadillo library, a C++ linear algebra library, must be installed.

Step 1: Install Armadillo

If you have Homebrew installed, Armadillo can be installed with the following command:

brew install armadillo

Alternatively, you can download the source files and compile them mannually. See the Armadillo documentation for more details.

Step 2: Install the scfind package

git clone https://github.com/ShaokunAn/tmp-scfind_py.git
cd tmp-scfind_py
pip install -r requirements.txt
python setup.py build_ext --inplace
python setup.py sdist bdist_wheel
pip install .

Tutorials

scfind offers efficient querying and access to large single-cell datasets through an interface that is both fast and user-friendly. Its primary function is to build an index, which enables user to query the dataset efficiently.

Examples

import scfind
import anndata

# Read the original AnnData
adata = anndata.read_h5ad('your/path/to/data.h5ad')

# Build the index
scfind_index = scfind.SCFind()
scfind_index.buildCellTypeIndex(adata=addata, dataset_name='your_data_name', 
cell_type_label='your_cell_type_label', 
feature_name='your_feature_name') 

The cell_type_label should correspond to a column in adata.obs that contains the cell type annotations. The feature_name should correspond to a column in adata.var that contains feature annotations, like gene names.

With the built index, users can perform various queries, such as finding cell type markers, identifying housekeeping genes across cell types, and conducting hypergeometric tests to discover significantly enriched cell types for provided genes. Below are some query functions in scfind. For additional functionalities, please refer to the scfind Nature methods paper.

# Find cell type markers
cell_types = ['your_interested_cell_types']
ct_markers = scfind_index.cellTypeMarkers(cell_types=cell_types, )
print(ct_markers)

# Find housekeeping genes across cell types
hk_genes = scfind_index.findHouseKeepingGenes(cell_types=scfind_index.cellTypeNames())
print(hk_genes)

# Find significantly enriched cell types for specific genes
genes = ['your_interesed_genes']
hypQ_cts = scfind_index.hyperQueryCellTypes(genes)
print(hypQ_cts)

# Merge two indices
index1 = scfind.SCFind()
index1.buildCellTypeIndex(adata=adata1, ...) # build the first index
index2 = scfind.SCFind()
index2.buildCellTypeIndex(adata=adata2, ...)
index1.mergeDataset(index2) # now index1 contains both adata1 and adata2

# Save the index
scfind_index.saveObject("your/save/path.bin")

# Load the index
load_index = scfind.SCFind()
load_index.loadObject("your/load/path.bin")

Citation

Please cite our work using the following reference:

@article{lee2021fast,
  title={Fast searches of large collections of single-cell data using scfind},
  author={Lee, Jimmy Tsz Hang and Patikas, Nikolaos and Kiselev, Vladimir Yu and Hemberg, Martin},
  journal={Nature methods},
  volume={18},
  number={3},
  pages={262--271},
  year={2021},
  publisher={Nature Publishing Group US New York}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scfind-0.1.1.tar.gz (44.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scfind-0.1.1-cp310-cp310-macosx_11_0_arm64.whl (227.5 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file scfind-0.1.1.tar.gz.

File metadata

  • Download URL: scfind-0.1.1.tar.gz
  • Upload date:
  • Size: 44.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for scfind-0.1.1.tar.gz
Algorithm Hash digest
SHA256 95fd80b4b566b2b24aeceedce98699dcc76ab50930b98eda0c869f2cfc3282c9
MD5 0882a7d9179d0de39d065c16c6ae2239
BLAKE2b-256 986d6a4c0f3c777b50408db34c1fae604f420635d9f357f6432b853a6c5e3217

See more details on using hashes here.

File details

Details for the file scfind-0.1.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scfind-0.1.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 16d6fbb25c7a416116713f23bc768fb0a7cbb91821d9c55352ae1c9b053546c5
MD5 5db745e88a9a310138eafee0aa8a1a96
BLAKE2b-256 d8597f224b7aa769bc58c4feaa9c7ac98403b16873caf94574444cbd319aa2d1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page