Spatial Aging Clock
Project description
Spatial Aging Clocks
This Python software package provides tools for deploying our spatial aging clocks (trained on brain coronal sections across 20 ages) to a single-cell dataset (both spatial or non-spatial).
Installation and setup
Complete installation (including of dependencies) in a new Conda environment should take less than 5 minutes on a normal desktop/laptop setup (Windows, Mac OSX, Linux). The base version only requires SquidPy and all of its dependencies.
Currently, the package is available through local installation (but will also become installable through pip)
Install with pip
Install the package through PyPI with pip. We recommend setting up a conda environment (https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) or another virtual environment first since spatial-aging-clock currently relies on specific versions for its dependencies (although it should generally work for other environment versions, but this hasn't been thoroughly tested):
conda create -n myenv python=3.8
conda activate myenv
<pip install any additional dependencies>
pip install spatial-aging-clock
Note that you will want to separately download the data from this repository (tests/data/) to run our tutorials.
Install locally
An alternative way to install the package along with associated test and tutorial files is to clone the directory and then install the requirements for using the package. To do this, first clone the repository using git (you can install git following the instructions here):
git clone https://github.com/sunericd/SpatialAgingClock.git
We recommend setting up a conda environment to install the requirements for the package (instructions for installing conda and what conda environment can do can be found here). Installation of requirements can then be done with the following commands:
conda create -n SpatialClock python=3.8
conda activate SpatialClock
cd SpatialAgingClock
pip install -r requirements.txt
To test that the installation is working correctly, you can run python test.py in the cloned directory.
Documentation
Below we include documentation and several mini-tutorials to highlight the main utilities of this package for deploying our spatial aging clocks.
Modules and functions
Deploy spatial aging clocks
spatialclock.deploy - module for deploying spatial aging clocks
-
get_predictions- generates predicted ages with spatial aging clocks-
Inputs:
adata[AnnData] - data for which to obtain predictions forclock_obj_name[str] - str identifier for the directory containing cell type-specific pkl model files adn training adatafill_missing[str] - how to impute missing gene values for prediction- "mean" --> impute with mean value
- "spage" --> uses SpaGE algorithm to impute missing values from training data
smooth[bool] - whether to smooth ; change to False if no adata.obsm["spatial"] in adatapseudobulk_data[bool] - if smooth is False, whether to pseudobulk data insteadnormalize[bool] - whether to normalize data using utils.normalize_adatastandardize[bool] - whether to standardize data using the pipeline in the pkl filesadd_in_place[bool] - whether to add predictions to adata.obs["predicted_age"] inplace
-
Outputs:
- [DataFrame] - columns with cell type, pred_age, cohort, and age
-
-
get_age_acceleration- computes age acceleration for predicted ages- Inputs:
adata[AnnData] - must have adata.obs.predicted_age - Outputs: Adds 'normalized_age_acceleration' to adata.obs
- Inputs:
Cell proximity effects
spatialclock.proximity - module for performing cell proximity effect analysis
-
nearest_distance_to_celltype- computes near/far cell proximity sets- Inputs:
adata[AnnData] - anndata containing the spatial transcriptomics and has adata.obs['celltype'] and adata.obsm['spatial']celltype_list[list of str] - celltype string identifiers found in adata.obs['celltype']sub_id[str or None] - name of adata.obs column to use to subset before identifying distances
- Output: Saves cell proximity sets into
adata
- Inputs:
-
compute_proximity_effects- computes proximity effect and other statistics- Inputs:
cutoff[int or dict]- if int, then use this cutoff distance for all examples
- if dict, then use {cutoff[region] = int} distance
celltypes[lst] - list of strings specifying effector cell types to compute proximity effects forcutoff_multiplier[float] - multipler for radius cutoffring_width[None or float] - width of ring to sample near cells where outer distance is cutoff*cutoff_multipler; if None, then sample all cells within cutoff (i.e. circle)region_obs[str] - key in adata.obs to get region labelscelltype_obs[str] - key in adata.obs to get cell type labelsanimal_obs[str] - key in adata.obs to get animal/sample labelscomparison[str] - how to determine "far" comparison group ("farthest", "random", "transcript_count")min_pairs[int] - minimum number of cells in near/far set to compute proximity effect for
- Inputs:
-
Output:
- [Dataframe] - containing the following columns:
- "Near Cell", effector cell type name
- "AgeAccel Cell", target cell type name
- "n", cutoff multiplier used
- "t", t-test statistic
- "p", p-value from t-test
- "Aging Effect", Proximity Effect
- "Near Freq", normalized frequency of interactions
- "Near Num", number of interactions
- [Dataframe] - containing the following columns:
Tutorial
# import packages
import spatialclock.deploy # for deploying spatial aging clocks
import spatialclock.proximity # for running proximity effect analysis
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scanpy as sc
import squidpy as sq
import anndata as ad
import os
# turn off warnings
import warnings
warnings.filterwarnings('ignore')
Loading dataset as Anndata object
For deploying spatial aging clocks, we rely on Anndata (annotated data) objects. These objects must contain the following:
.X- raw counts matrix (rows are cells, columns are genes).var_names- names of genes formatted with first letter capitalized and the rest lowercase (e.g. "Lamp5").obsm["spatial"]- (n x 2) dimensional array with the two-dimensional spatial coordinates for all n cells
Inside .obs, there must be the following keys:
mouse_id- categorical indicating mouse/subject that cell comes fromcelltype- categorical indicating the cell type annotation for each cell
The following keys must also be in .obs. They aren't used by the clocks and are for bookkeeping so you can put in placeholder values if you want:
age- float indicating chronological age of mouse/subjectcohort- categorical indicating group that mouse/subject belongs toregion- categorical indicating region where cell is located
Generally, Anndata objects are saved as .h5ad extension files and these can be loaded in with Scanpy or Anndata commands. You can also convert text/csv files containing the counts and metadata into Anndata objects (for example, see: https://github.com/sunericd/TISSUE)
In our case, we will load in a sample dataset (shipped with package):
# read in data with scanpy
adata = sc.read_h5ad("data/small_data.h5ad")
adata
AnnData object with n_obs × n_vars = 4788 × 200
obs: 'mouse_id', 'cohort', 'age', 'celltype', 'region'
obsm: 'spatial'
Deploying spatial aging clocks on a dataset
Next, we can generated predicted ages with the spatial aging clocks using the get_predictions() command. If there are missing genes in the dataset, you can specify the imputation method using the fill_missing parameter (options are "mean" and "spage" with the "spage" option currently being under development since it relies on the training dataset). If you would like to impute missing gene expression with an alternative method, this can be done by modifying the AnnData with these imputed values before running get_predictions().
Note that the predicted ages will be returned as the output dataframe but will also be saved in the Anndata object under .obs.predicted_age if add_in_place is True (default).
The default clock object name is clock_obj_name = "lasso_cv5_nalphas20_spatialsmooth_alpha08_neigh20" (our spatial aging clocks) and these are the only clocks installed with this package. If you want deploy your own clocks/other spatial aging clocks, you will need to copy the appropriately formatted clock files to the corresponding package directory and change the clock_obj_name value to the new model's name.
# predict age with spatial aging clocks
df = spatialclock.deploy.get_predictions(adata)
Found clock for Astrocyte
Imputing values for 100 missing genes
Found clock for Endothelial
Imputing values for 100 missing genes
Found clock for Ependymal
Imputing values for 100 missing genes
Found clock for Macrophage
Imputing values for 100 missing genes
Found clock for Microglia
Imputing values for 100 missing genes
Found clock for NSC
Imputing values for 100 missing genes
Found clock for Neuroblast
Imputing values for 100 missing genes
Found clock for Neuron-Excitatory
Imputing values for 100 missing genes
Found clock for Neuron-Inhibitory
Imputing values for 100 missing genes
Found clock for Neuron-MSN
Imputing values for 100 missing genes
Found clock for OPC
Imputing values for 100 missing genes
Found clock for Oligodendrocyte
Imputing values for 100 missing genes
Found clock for Pericyte
Imputing values for 100 missing genes
Found clock for VLMC
Imputing values for 100 missing genes
Found clock for VSMC
Imputing values for 100 missing genes
Let's try looking at the distribution of predicted ages between the young and old samples.
plt.hist(df[df["cohort"]=="young_control"]["pred_age"], label="Young", color="g", alpha=0.5)
plt.hist(df[df["cohort"]=="old_control"]["pred_age"], label="Old", color="tab:orange", alpha=0.5)
plt.legend(fontsize=16)
plt.xlabel("Predicted Age", fontsize=16)
plt.ylabel("Cell Density", fontsize=16)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.show()
Age acceleration
We are often interested in comparing the outputs of the spatial aging clocks across different cells or different ages. To do so, we need to measure deviation of predicted age from expected predicted age. We call this measure the "age acceleration" and it is computed with deploy.get_age_acceleration().
# compute age acceleration
spatialclock.deploy.get_age_acceleration (adata)
We can visualize age acceleration across all cells in a sample spatially.
sc.pl.embedding(adata[adata.obs.mouse_id=="YC1"], "spatial",
color="normalized_age_acceleration", cmap="RdBu_r")
Cell proximity effect analysis
A useful application of the predicted ages from spatial aging clocks is to investigate the effects of cell type proximity on cell aging. We provide the spatialclock.proximity module with functionalties for performing the proximity effect analysis.
First, the age acceleration must be computed (see above). Then, the spatialclock.proximity module will be used to perform proximity analysis. It relies on the following steps:
-
Compute the nearest distance to each cell type for each cell using
nearest_distance_to_celltype(). -
Compute the proximity effect using
compute_proximity_effects(), which will return a dataframe with all relevant statistics.
# Compute nearest to cell type distances
celltypes = pd.unique(adata.obs.celltype).sort_values()
spatialclock.proximity.nearest_distance_to_celltype(adata,
celltype_list=celltypes,
sub_id="mouse_id")
Due to the small size of the tutorial dataset, we don't expect to get really good estimates of proximity effects (for that you would want lots of samples and lots of cells). For demonstration purposes, we will lower the filtering threshold to min_pairs=1 (instead of the recommended 50) and also set the adata.obs.region to be a global value for all cells.
For accurate proximity effects, you will need a larger dataset (see manuscript for details)
# get proximity effects
cutoff = 30 # this can also be a region-specific dictionary of cutoffs
celltypes = pd.unique(adata.obs.celltype).sort_values()
adata.obs.region = 'global' # for tutorial only
df = spatialclock.proximity.compute_proximity_effects(adata, cutoff, celltypes,
min_pairs=1) # for tutorial only
For visualization/plotting code, please refer to the Github repository for the associated manuscript.
Guidelines for spatial aging clocks
Which datasets can spatial aging clocks be used for?
Spatial aging clock performance falls under two fronts: (1) the accuracy of the predicted age which is more sensitive, and (2) the preservation of order information (i.e. differences in predicted age across age groups and experimental conditions) which is more robust.
Spatial aging clocks will provide accurate predicted age and accurate order information for:
- Male C57BL/6J mice
- MERSCOPE or MERFISH spatial transcriptomics
- Gene panel including all 300 spatial aging clock genes
- Cortex, Striatum and adjacent regions, Corpus callosum, Lateral ventricles
Spatial aging clocks (most predicted ages and almost all order information) should generalize to:
- Female C57BL/6J mice
- C57BL/6J hybrid male mice
- Other single-cell transcriptomics technologies (STARmap, ISS, 10x Chromium, SmartSeq, etc)
- Note that some technologies have higher percentage of zero counts than MERFISH (e.g. STARmap and ISS) and we observe slightly degraded accuracy in the absolute age prediction on these modalities.
- Gene panel including at least 60 genes shared with the spatial aging clocks
- Other brain regions not listed above
- Most experimental perturbations and disease states
Parameters where we have not tested spatial aging clocks:
- Different genetic backgrounds
- Non-mouse species
- Gene panels with less than 60 genes shared with the spatial aging clocks
- Non-brain tissues
Finally, we always recommend including young and old timepoints (for control conditions) which will provide you with information on how well the spatial aging clocks are calibrated for both accuracy of predicted age and for difference between age groups.
How to speed up age prediction?
-
If you are using SpaGE imputation for missing genes, we recommend using mean imputation instead which is considerably faster and provides similar predictions.
-
For training and deploying aging clocks, you can use multi-threading through the Scikit-Learn API (see https://scikit-learn.org/stable/computing/parallelism.html)
UNDER DEVELOPMENT:
- SpaGE-based imputation from training dataset
- Frameworks for building spatial aging clocks from AnnData objects
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spatial_aging_clock-0.0.2.tar.gz.
File metadata
- Download URL: spatial_aging_clock-0.0.2.tar.gz
- Upload date:
- Size: 1.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
270eba1db8d387976a817dbd0721fbecaedf97facd312b6a124ad19603ad9386
|
|
| MD5 |
45bc3cad8d7e7261ac2e9b527cfa589c
|
|
| BLAKE2b-256 |
e84008a2a8dba93100be956614d6d08586b4fa33b04abbe47b200981686f9057
|
File details
Details for the file spatial_aging_clock-0.0.2-py2.py3-none-any.whl.
File metadata
- Download URL: spatial_aging_clock-0.0.2-py2.py3-none-any.whl
- Upload date:
- Size: 291.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac9d5e09d58d7643390d390ceb2b4144dd66a4b5848b85821343bba33241c7d2
|
|
| MD5 |
f6c2887fce1b720bf0a7e9ddd8d38928
|
|
| BLAKE2b-256 |
f0a33375387a50493272aa568c0f96cf0e7f30ba39c1da2f4a6fe6a0a3a7ffa9
|