Skip to main content

scfind is a method for searching specific cell types from large single-cell datasets by a query of gene list. scfind can suggest subqueries score by TF-IDF method. scfind can perform hypergeometric test which allows the evaluation of marker genes specific to each cell type within a dataset.

Project description

scfind

scfind - Fast searches of large collections of single cell data

Single-cell technologies have enabled the profiling of millions of cells. However, for these vast resources to be fully leveraged, they must be easily queryable and accessible. To facilitate interactive and intuitive access to single-cell data, we have developed scfind, a search engine for cell atlases. Scfind can be utilized to evaluate marker genes, perform in silico gating, and identify both cell-type specific and housekeeping genes. An interactive interface with access to nine single-cell datasets is available at scfind.sanger.ac.uk.

Installation

scfind is available as a package for Python and R, with C++ extensions. The scfind R library is accessible at our GitHub repository. Before installing the scfind Python package, the Armadillo library, a C++ linear algebra library, must be installed.

Step 1: Install Armadillo

If you have Homebrew installed, Armadillo can be installed with the following command:

brew install armadillo

Alternatively, you can download the source files and compile them mannually. See the Armadillo documentation for more details.

Step 2: Install the scfind package

git clone https://github.com/ShaokunAn/tmp-scfind_py.git
cd tmp-scfind_py
pip install -r requirements.txt
python setup.py build_ext --inplace
python setup.py sdist bdist_wheel
pip install .

Tutorials

scfind offers efficient querying and access to large single-cell datasets through an interface that is both fast and user-friendly. Its primary function is to build an index, which enables user to query the dataset efficiently.

Examples

import scfind
import anndata

# Read the original AnnData
adata = anndata.read_h5ad('your/path/to/data.h5ad')

# Build the index
scfind_index = scfind.SCFind()
scfind_index.buildCellTypeIndex(adata=addata, dataset_name='your_data_name', 
cell_type_label='your_cell_type_label', 
feature_name='your_feature_name') 

The cell_type_label should correspond to a column in adata.obs that contains the cell type annotations. The feature_name should correspond to a column in adata.var that contains feature annotations, like gene names.

With the built index, users can perform various queries, such as finding cell type markers, identifying housekeeping genes across cell types, and conducting hypergeometric tests to discover significantly enriched cell types for provided genes. Below are some query functions in scfind. For additional functionalities, please refer to the scfind Nature methods paper.

# Find cell type markers
cell_types = ['your_interested_cell_types']
ct_markers = scfind_index.cellTypeMarkers(cell_types=cell_types, )
print(ct_markers)

# Find housekeeping genes across cell types
hk_genes = scfind_index.findHouseKeepingGenes(cell_types=scfind_index.cellTypeNames())
print(hk_genes)

# Find significantly enriched cell types for specific genes
genes = ['your_interesed_genes']
hypQ_cts = scfind_index.hyperQueryCellTypes(genes)
print(hypQ_cts)

# Merge two indices
index1 = scfind.SCFind()
index1.buildCellTypeIndex(adata=adata1, ...) # build the first index
index2 = scfind.SCFind()
index2.buildCellTypeIndex(adata=adata2, ...)
index1.mergeDataset(index2) # now index1 contains both adata1 and adata2

# Save the index
scfind_index.saveObject("your/save/path.bin")

# Load the index
load_index = scfind.SCFind()
load_index.loadObject("your/load/path.bin")

Citation

Please cite our work using the following reference:

@article{lee2021fast,
  title={Fast searches of large collections of single-cell data using scfind},
  author={Lee, Jimmy Tsz Hang and Patikas, Nikolaos and Kiselev, Vladimir Yu and Hemberg, Martin},
  journal={Nature methods},
  volume={18},
  number={3},
  pages={262--271},
  year={2021},
  publisher={Nature Publishing Group US New York}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scfind-0.1.3.tar.gz (49.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scfind-0.1.3-cp310-cp310-macosx_11_0_arm64.whl (227.6 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file scfind-0.1.3.tar.gz.

File metadata

  • Download URL: scfind-0.1.3.tar.gz
  • Upload date:
  • Size: 49.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.0

File hashes

Hashes for scfind-0.1.3.tar.gz
Algorithm Hash digest
SHA256 796f05d3a0005a74895ef16c4f0892769538dacffd7d33083d76dd70bb93115b
MD5 86169f388fb8ca54408ddf71f3f17d3e
BLAKE2b-256 64b78e430f0a1a7a9ffb609620e64c73995302b31acd84a6f2e6bce9444dc8bb

See more details on using hashes here.

File details

Details for the file scfind-0.1.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scfind-0.1.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 37ea4e36b10da09b375bef753736f60dc4adece61dae592cfb2113159296852c
MD5 0e8c966a67bd4380f2fd3006d3b77bd2
BLAKE2b-256 e800c6973d8932ce425b9c39d373c862c141b055f654d694fc926d2dd5d09541

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page