scfind is a method for searching specific cell types from large single-cell datasets by a query of gene list. scfind can suggest subqueries score by TF-IDF method. scfind can perform hypergeometric test which allows the evaluation of marker genes specific to each cell type within a dataset.
Project description
scfind - Fast searches of large collections of single cell data
Single-cell technologies have enabled the profiling of millions of cells. However, for these vast resources to be fully leveraged, they must be easily queryable and accessible. To facilitate interactive and intuitive access to single-cell data, we have developed scfind, a search engine for cell atlases. Scfind can be utilized to evaluate marker genes, perform in silico gating, and identify both cell-type specific and housekeeping genes. An interactive interface with access to nine single-cell datasets is available at scfind.sanger.ac.uk.
Installation
scfind is available as a package for Python and R, with C++ extensions. The scfind R library is accessible at our GitHub repository. Before installing the scfind Python package, the Armadillo library, a C++ linear algebra library, must be installed.
Step 1: Install Armadillo
If you have Homebrew installed, Armadillo can be installed with the following command:
brew install armadillo
Alternatively, you can download the source files and compile them mannually. See the Armadillo documentation for more details.
Step 2: Install the scfind package
git clone https://github.com/ShaokunAn/tmp-scfind_py.git
cd tmp-scfind_py
pip install -r requirements.txt
python setup.py build_ext --inplace
python setup.py sdist bdist_wheel
pip install .
Tutorials
scfind offers efficient querying and access to large single-cell datasets through an interface that is both fast and user-friendly. Its primary function is to build an index, which enables user to query the dataset efficiently.
Examples
import scfind
import anndata
# Read the original AnnData
adata = anndata.read_h5ad('your/path/to/data.h5ad')
# Build the index
scfind_index = scfind.SCFind()
scfind_index.buildCellTypeIndex(adata=addata, dataset_name='your_data_name',
cell_type_label='your_cell_type_label',
feature_name='your_feature_name')
The cell_type_label should correspond to a column in adata.obs that contains the cell type annotations. The feature_name should correspond to a column in adata.var that contains feature annotations, like gene names.
With the built index, users can perform various queries, such as finding cell type markers, identifying housekeeping genes across cell types, and conducting hypergeometric tests to discover significantly enriched cell types for provided genes. Below are some query functions in scfind. For additional functionalities, please refer to the scfind Nature methods paper.
# Find cell type markers
cell_types = ['your_interested_cell_types']
ct_markers = scfind_index.cellTypeMarkers(cell_types=cell_types, )
print(ct_markers)
# Find housekeeping genes across cell types
hk_genes = scfind_index.findHouseKeepingGenes(cell_types=scfind_index.cellTypeNames())
print(hk_genes)
# Find significantly enriched cell types for specific genes
genes = ['your_interesed_genes']
hypQ_cts = scfind_index.hyperQueryCellTypes(genes)
print(hypQ_cts)
# Merge two indices
index1 = scfind.SCFind()
index1.buildCellTypeIndex(adata=adata1, ...) # build the first index
index2 = scfind.SCFind()
index2.buildCellTypeIndex(adata=adata2, ...)
index1.mergeDataset(index2) # now index1 contains both adata1 and adata2
# Save the index
scfind_index.saveObject("your/save/path.bin")
# Load the index
load_index = scfind.SCFind()
load_index.loadObject("your/load/path.bin")
Citation
Please cite our work using the following reference:
@article{lee2021fast,
title={Fast searches of large collections of single-cell data using scfind},
author={Lee, Jimmy Tsz Hang and Patikas, Nikolaos and Kiselev, Vladimir Yu and Hemberg, Martin},
journal={Nature methods},
volume={18},
number={3},
pages={262--271},
year={2021},
publisher={Nature Publishing Group US New York}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scfind-0.1.2.tar.gz.
File metadata
- Download URL: scfind-0.1.2.tar.gz
- Upload date:
- Size: 44.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06c78bf2b575327ab6134b1bae4c61fe117b65d9fee5d0a3c724a68caffa657a
|
|
| MD5 |
0cb8f803b39c643a605cbc9ba034f44d
|
|
| BLAKE2b-256 |
4824df939a5640fc313858063851649f1c7eb4513f4d2f9ac4ed8331a836a784
|
File details
Details for the file scfind-0.1.2-cp310-cp310-macosx_11_0_arm64.whl.
File metadata
- Download URL: scfind-0.1.2-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 227.6 kB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f2b8749ff5b346a18654edf8fa51ee75e63f957e29766ead102d1e03f29525ba
|
|
| MD5 |
f11d9ed926a577a7846d16fece448603
|
|
| BLAKE2b-256 |
a61179abf6fed74aa82feeddb23f38d419642bc7c1ed1709ae30522bd39d7d5a
|