scalex

Integrating heterogeneous single-cell data in a generalized cell embedding space for construction of continuously expandable single-cell atlases

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
Programming Language
- Python :: 3.7
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

[![Stars](https://img.shields.io/github/stars/jsxlei/scalex?logo=GitHub&color=yellow)](https://github.com/jsxlei/scalex/stargazers) [![PyPI](https://img.shields.io/pypi/v/scalex.svg)](https://pypi.org/project/scalex) [![Documentation Status](https://readthedocs.org/projects/scalex/badge/?version=latest)](https://scalex.readthedocs.io/en/latest/?badge=stable) [![Downloads](https://pepy.tech/badge/scalex)](https://pepy.tech/project/scalex) # SCALEX: Single-cell integrative Analysis via latent Feature Extraction

## [Documentation](https://scalex.readthedocs.io/en/latest/index.html)

## Installation #### install from PyPI

pip install scalex

#### install from GitHub

git clone git://github.com/jsxlei/scalex.git cd scalex python setup.py install

SCALEX is implemented in [Pytorch](https://pytorch.org/) framework. Running SCALEX on CUDA is recommended if available. Installation only requires a few minutes.

## Quick Start

SCALEX can both used under command line and API function in jupyter notebook

### 1. Command line

SCALE.py –data_list data1 data2 dataN –batch_categories batch1 batch2 batchN

#### Option

–data_list

A list of matrices file (each as a batch) or a single batch/batch-merged file.
–batch_categories

Categories for the batch annotation. By default, use increasing numbers if not given
–profile

Specify the single-cell profile, RNA or ATAC. Default: RNA.
–min_features

Filtered out cells that are detected in less than min_features. Default: 600 for RNA, 100 for ATAC.
–min_cells

Filtered out genes that are detected in less than min_cells. Default: 3.
–n_top_features

Number of highly-variable genes to keep. Default: 2000 for RNA, 30000 for ATAC.
–outdir

Output directory. Default: ‘output/’.
–projection

Use for new dataset projection. Input the folder containing the pre-trained model. Default: None.
–impute

If True, calculate the imputed gene expression and store it at adata.layers[‘impute’]. Default: False.
–chunk_size

Number of samples from the same batch to transform. Default: 20000.
–ignore_umap

If True, do not perform UMAP for visualization and leiden for clustering. Default: False.
–join

Use intersection (‘inner’) or union (‘outer’) of variables of different batches.
–batch_key

Add the batch annotation to obs using this key. By default, batch_key=’batch’.
–batch_name

Use this annotation in obs as batches for training model. Default: ‘batch’.
–batch_size

Number of samples per batch to load. Default: 64.
–lr

Learning rate. Default: 2e-4.
–max_iteration

Max iterations for training. Training one batch_size samples is one iteration. Default: 30000.
–seed

Random seed for torch and numpy. Default: 124.
–gpu

Index of GPU to use if GPU is available. Default: 0.
–verbose

Verbosity, True or False. Default: False.

#### Output Output will be saved in the output folder including: * checkpoint: saved model to reproduce results cooperated with option –checkpoint or -c * [adata.h5ad](https://anndata.readthedocs.io/en/stable/anndata.AnnData.html#anndata.AnnData): preprocessed data and results including, latent, clustering and imputation * umap.png: UMAP visualization of latent representations of cells * log.txt: log file of training process

#### Useful options * output folder for saveing results: [-o] or [–outdir] * filter rare genes, default 3: [–min_cells] * filter low quality cells, default 600: [–min_features] * select the number of highly variable genes, keep all genes with -1, default 2000: [–n_top_featuress]

#### Help Look for more usage of SCALEX

SCALEX.py –help

### 2. API function

from scalex import SCALEX adata = SCALEX(data_list, batch_categories)

Function of parameters are similar to command line options. Output is a Anndata object for further analysis with scanpy.

## [Tutorial](https://scalex.readthedocs.io/en/latest/tutorial/index.html)

## Previous version [SCALE](https://github.com/jsxlei/SCALE)

Previous SCALE for single-cell ATAC-seq analysis is still available in SCALEX by command line (–version 1) or api (SCALE_v1).

### Command line

SCALEX.py -d data –version 1

### API

from scale.extensions import SCALE_v1 SCALE_v1(data)

All the usage is the same with previous SCALE version 1.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
Programming Language
- Python :: 3.7
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

1.0.3

Apr 16, 2024

1.0.2

Oct 25, 2022

1.0.1

Oct 19, 2022

1.0.0

Aug 29, 2022

This version

0.2.0

Mar 29, 2021

0.0.13

Feb 24, 2021

0.0.12

Feb 19, 2021

0.0.11

Jan 5, 2021

0.0.10

Dec 27, 2020

0.0.9

Dec 23, 2020

0.0.8

Dec 18, 2020

0.0.7

Dec 8, 2020

0.0.6

Dec 8, 2020

0.0.5

Dec 5, 2020

0.0.5b0 pre-release

Dec 5, 2020

0.0.4

Nov 27, 2020

0.0.3

Sep 9, 2020

0.0.3rc0 pre-release

Nov 27, 2020

0.0.2

Sep 9, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scalex-0.2.0.tar.gz (2.9 MB view hashes)

Uploaded Mar 29, 2021 Source

Built Distributions

scalex-0.2.0-py3.9.egg (49.5 kB view hashes)

Uploaded Aug 29, 2022 Source

scalex-0.2.0-py3-none-any.whl (23.3 kB view hashes)

Uploaded Mar 29, 2021 Python 3

Hashes for scalex-0.2.0.tar.gz

Hashes for scalex-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`3a5721d755e0c4be9597559dd9db0d63fadbd76b085204261adc2b90ff01f588`
MD5	`765164a02ce27b77147b7fa3cea23195`
BLAKE2b-256	`591ce85a69a75b7e521a4dc6ce37de749649d6b5ae0385aa80c0aab7861642d1`

Hashes for scalex-0.2.0-py3.9.egg

Hashes for scalex-0.2.0-py3.9.egg
Algorithm	Hash digest
SHA256	`b958efd41456de796a5639800d6ecd6fa15be7b6c904ab3ee07a6ee623d3008f`
MD5	`1f97a0a5cf5489fab35af74d6216c8d7`
BLAKE2b-256	`a800109a65d722cc733a58b4011e24de6b378505cd5c3a1849a0a7d4c383a3ad`

Hashes for scalex-0.2.0-py3-none-any.whl

Hashes for scalex-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4935c19c291e251f2b20560a954cacd60ec94f305f2e01c7f362e01321586d88`
MD5	`eeeac0e96282341d628440150d0f0eff`
BLAKE2b-256	`4449f2b3fa424f6e4cf32be856c723eafcea9eb7d2f69d8a39f13dbe07a6affc`