PyPsupertime

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

PyPsupertime

PyPsupertime is a scalable python re-implementation of the R package psupertime for analysis of single-cell RNA sequencing data where the cell have an ordinal annotation (e.g. time series, or dosage). It can be used to identify a small subset of cells which contribute to the ordering and reconstruct a pseudotime.

The original methodology is published in Bioinformatics: https://doi.org/10.1093/bioinformatics/btac227

Getting Started

Install via pip:

pip install pypsupertime

This installs pypsupertime and its dependencies automatically

We recommend installing inside a virtualenv or pipenv environment.

Description

This package implements a modular API for preprocessing of single cell data, restructuring of the input data under different statistical assumptions, as well as creating and fitting memory-efficient supervised ordinal logistic models.

The central idea of the original work remained unchanged: Find a linear model that accurately predicts the ordinal labels while being restricted to a sparse set of features (genes), by searching along a path of regularization hyperparameters. To adress memory inefficiencies in the original work, the coordinate descent approach is replaced in favor of a stochistic gradient descent model with online fitting. Additionaly, new parametrized penalties are possible, allowing more control over the sparsity.

From a statistical perspective, the ordinal nature of the input data can be modeled under the cumulative proportional odds, forward continuation ratio and backward continuation ratio assumption.

All model and preprocessing classes fit implement scikit-learn estimators or transformers to fit seamlessly into its ecosystem and are wrapped by the Psupertime class at the core of this pacakge that allows all input and output data to be represented as anndata objects.

Please find a more detailed description documentation hosted by readthedocs here.

Basic Usage

The code below runs a psupertime analysis with default settings on a data set represented as AnnData object and stored in a .h5ad file. The data has a numeric ordinal cell annotation representing stored in the obs dataframe under the key "time".

from pypsupertime import Psupertime
p = Psupertime()
anndata = p.run("/path/to/data_sce.h5ad", "time")

Input Data: n_genes=24153, n_cells=992
Preprocessing: done. mode='all', n_genes=11305, n_cells=992
Grid Search CV: CPUs=4, n_folds=5
Regularization: done   
Refit on all data: done. accuracy=0.5195, n_genes=113
Total elapsed time:  0:01:26.141356

The code loads the single-cell data and perfoms default preprocessing, then runs a 5-fold cross validatied grid search along the default regularization path to identify the hyperparameter which results in the best-scoring sparse model, and finally refits the model with the ideal regularization on all data. Using that model, relevant genes are identified and the psupertime is predicted for all cells.

The following snippets show how to quickly inspect and evaluate the results. The regularization progress, model performance, and selected genes are shown here, but many more are available.

p.plot_grid_search(title="Grid Search")

grid search

p.plot_model_perf((adata.X, adata.obs.time), figsize=(6,5))

confusion matrix

p.plot_identified_gene_coefficients(adata, n_top=20)

genes

p.plot_labels_over_psupertime(adata, "time")

labels over psupertime

It is highly recommended to be aware of the preprocessing steps performed or perform key preprocessing manually. For a complete overview, look at the documentation.

Development Roadmap

Extension of the pypsupertime.plots module with further analyses
Extension of the Preprocessing to allow custom pipelines (see version 1.1.0)
Integration into the scanpy project
Unit Tests, when the code is stable enough

Changelog:

Version 2.1.7:
- Refactoring of model classes and PsupertimeBaseModel to remove redundancies and improve readability
- Bugfixes
- Changes defaults
- Tracks scores when refitting
Version 2.0.0:
- Changes undelying binary model from sklearn.SGDClassifier to pytorch model to gain more control over the loss and penalty calculation. The thresholds can now be trained without penalty. These changes remove/change some of the previous parameters and attributes in model classes
Version 1.2.2:
- Bugfixes
- Removes inplace option in run(). It cannot currently be enforced in preprocessing. A copy of the input adata will always be created and the processed object returned.
Version 1.2.0:
- Returns an anndata object in Psupertime.run() if a filename is given
Versions 1.1.1, 1.1.2, 1.1.3:
- Bugfixes
Version 1.1.0:
- Add preprocessing_class parameter to enable using custom / no preprocessing
- Adds heuristic for selecting the lowest regularization parameter when none is specified
- Adds shorthand for selecting optimal regularization parameter at 1/2 standard error from the best score
- Fix bug when using smooth with sparse matrices

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

2.2.2

Oct 17, 2023

2.1.14

Oct 13, 2023

2.1.9

Sep 25, 2023

2.1.8

Sep 25, 2023

This version

2.1.7

Sep 24, 2023

1.2.2

Sep 18, 2023

1.2.1

Sep 17, 2023

1.1.3

Sep 17, 2023

1.0.1

Aug 15, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypsupertime-2.1.7.tar.gz (22.4 kB view details)

Uploaded Sep 24, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pypsupertime-2.1.7-py3-none-any.whl (22.7 kB view details)

Uploaded Sep 24, 2023 Python 3

File details

Details for the file pypsupertime-2.1.7.tar.gz.

File metadata

Download URL: pypsupertime-2.1.7.tar.gz
Upload date: Sep 24, 2023
Size: 22.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for pypsupertime-2.1.7.tar.gz
Algorithm	Hash digest
SHA256	`8d99cb7a6e3edac7bf08a3bf5d2b813c2303beb65439dd4b6a5ffd0e41c98930`
MD5	`656ea461f0b8605de8ff2e6198ad98c4`
BLAKE2b-256	`9b94a3d774a9df6a2b963a70cdae8053a1fae89c42c4b05cf4918c3cbcd33645`

See more details on using hashes here.

File details

Details for the file pypsupertime-2.1.7-py3-none-any.whl.

File metadata

Download URL: pypsupertime-2.1.7-py3-none-any.whl
Upload date: Sep 24, 2023
Size: 22.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for pypsupertime-2.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`236afe75f2b185fa8f0d1e7b84e2bcc3c3f59dc4e04df2fc1831b45ba32e01ab`
MD5	`de2c1d77657ef3b5881fb75b898490f4`
BLAKE2b-256	`0f777b6048edfb7d4bb0f55d4d28481a2cc019a790fdd1548508a844aa23dd87`

See more details on using hashes here.

pypsupertime 2.1.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PyPsupertime

Getting Started

Description

Basic Usage

Development Roadmap

Changelog:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes