Statistically assess the level of both spatial and global heterogeneity within a spatial sample.

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Welcome to the HTA package!

The HTA package can statistically assess the level of both spatial, and global, heterogeneity within a spatial sample. HTA was specifically designed to handle multivariate spatial transcriptomics data, such as Visium samples (10x Genomics), but can be used in other domains (see our paper [1] for further details).

If you use hta in your research, pleas cite: Levy-Jurgenson et al. (Bioinformatics, 2021) [1].

Installation

You will need Python 3.8 or above.

To install the package and follow along with our examples below use the following command:

pip install hta

Support and bug reports

If you encounter any issues, please contact levy alona (as one word) through gmail.

Example 1 - Synthetic Data

Input format:

Since generating a trait-combination matrix may be complicated, HTA generates it for you from a simpler form of input: a stacked set of matrices where each matrix represents one trait, and each entry indicates if the trait manifests or not (0/1) at the corresponding spatial position.

For example, for 2 traits and a 2D space of 32x32, let's generate some random input to HTA:

from hta.stats import HTA
import numpy as np

n_traits = 2
t_shape = (32, 32, n_traits) 
t = np.random.random(t_shape)   # random values between 0 and 1
t = (t > 0.5)*1   # binarising to 0/1

Now t contains two stacked matrices, each of shape 32x32. The first represents trait no. 1, the second represents trait no. 2 and each entry has either 0/1 (1 indicates that the trait manifests in that entry).

HTA:

Once t is ready, we can run HTA.

For example, using t from above we can decide on a region size of 8 (i.e., each cell in the grid is 8x8) and run the following:

region_size = 8   # you can also use [8,8] or [8,16] etc., corresp. to [x,y].
hta = HTA(t, region_size)
hta_stat, hta_pval = hta.calc()
print('HTA {:.2f},  p-val: {:.2e}'.format(hta_stat, hta_pval))

Heterogeneity maps:

To correctly produce the heterogeneity maps, you must specify the name of the traits as they are ordered in t (i.e., the trait at index 0 should be the first (index 0) in the stacked matrices in t, and the trait at index n_traits-1 should be last in t). You can then call the function hta.plot_heterogeneity_map(..). This function will return a pyplot object which can then be modified to your liking (e.g. adding a legend, title etc.).

For example:

trait_names = ['Trait {}'.format(i+1) for i in range(n_traits)]

hm = hta.plot_heterogeneity_map(trait_names, dot_size=5)
hm.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15), ncol=3)
title = 'HTA {:.2f} (p-val: {:.2e}), region_size={}' \
    .format(hta_stat, hta_pval, region_size)
font_dict = {'family': 'arial', 'size': 11}
hm.title(title, fontdict=font_dict)
hm.savefig('result.jpeg', dpi=350)
hm.close()

Region report:

To produce the region report mentioned in the paper (which provides additional information on each region) you can use the following code:

rr = hta.region_report(trait_names)
rr.to_csv('region_report.csv')

Example 2 - Visium data

If you are analysing Visium spatial gene expression data (e.g. any of those listed here, or this one used in our examples below) you can use our Visium built-in class to generate the input format for HTA (i.e., the t as in Example 1).

Make sure you have obtained the following files for your Visium data:

filtered_feature_bc_matrix

spatial

and place them in a folder hierarchy as shown below (all shown files are required, and all file/folder names must be identical to those shown below, except for the data folder name):

YOUR_DATA_FOLDER
└───filtered_feature_bc_matrix
│   │   barcodes.tsv.gz
│   │   features.tsv.gz
│   │   matrix.mtx.gz
└───spatial
   │   tissue_positions_list.csv
   │   ...

You are now ready to load your Visium data and use HTA. The following code loads the data (note that this may take around 30s), prepares it for HTA analysis, and computes the HTA p-value:

from hta.stats import HTA  
from hta.utils import Visium    

path = "/PATH_TO/YOUR_DATA_FOLDER" # TODO: insert path to 'YOUR_DATA_FOLDER' (no '/' on end)
trait_names = ['ERBB2', 'CD8A']   # names of features to use in features.tsv.gz  
  
# load and prepare visium data for HTA  
visium = Visium(path)  
visium.load()  
t, t_mask = visium.prep(trait_names)  
  
# compute HTA and HTA p-val  
region_size = 15   # modify region_size as needed
hta = HTA(t, region_size=region_size, tissue_mask=t_mask) 
hta_stat, hta_pval = hta.calc()
print("HTA p-value: ", hta_pval)

What is t_mask? It identifies, using barcodes.tsv.gz, which barcodes are under the tissue, and is used to discard barcodeds that are not.

Now we can proceed to produce the heterogeneity map and region report. The code below uses the results from the previous block of code to generate the heterogeneity map and region report. We've left the p-val title formatting code for your convenience so that you can easily replace it with your own title format:

import math

# generate heterogeneity map and legend
hm = hta.plot_heterogeneity_map(trait_names, dot_size=8)  
hm.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15), ncol=4, fontsize=9)  
  
# set format for p-val and title
if hta_pval <= 10**-10000:  
    title = 'HTA {:.2f} (p-val ~ 0), region size: {}'.format(hta_stat, region_size)  
elif hta_pval < 0.00001:  
    p_power_with_base_10 = math.log10(hta_pval)  
    hta_pval = p_power_with_base_10  
    title = 'HTA {:.2f} (p-val: 10^{:.0f}), region size: {}'.format(hta_stat, hta_pval, region_size)  
else:  
    title = 'HTA {:.2f} (p-val: {:.2f}), region size: {}'.format(hta_stat, hta_pval, region_size)  

# save heterogeneity map 
font_dict = {'family': 'normal', 'size': 9}  
hm.title(title, fontdict=font_dict)  
hm.savefig('{}_hetero_map.jpeg'.format('_'.join(trait_names)), dpi=350)  
hm.close()  
  
# save region report  
rr = hta.region_report(trait_names)  
rr.to_csv('{}_region_report.csv'.format('_'.join(trait_names)))

Example 3 - Visium with cluster id per barcode

(*) This is slightly more advanced. We recommend going through the previous Visium example before attempting this one.

You can also use HTA with cluster IDs generated per barcode. The best example is using the cluster IDs provided in Visium's analysis folder, but you can use your own cluster IDs, provided they have the same format.

As an example, you can place Visium's 'analysis' folder (see links above) under your 'YOUR_DATA_FOLDER'. The 'analysis' folder contains many k-means clustering results where each barcode has a cluster ID.

The main differences in the code below compared to Example 2 above are in the lines of code marked with (***):

from hta.utils import Visium  
from hta.stats import HTA  
  
path = "/PATH_TO/YOUR_DATA_FOLDER" # TODO: insert path to 'YOUR_DATA_FOLDER' (no '/' on end)
k = 10
clusters_path = '{}/analysis/clustering/kmeans_{}_clusters/clusters.csv'.format(path, k)
trait_names = [str(i+1) for i in range(k)]  
  
visium = Visium(path)  
visium.load()  
t, t_mask = visium.prep_clusters(clusters_path)  # (***)

# copute HTA
region_size = 15  
hta = HTA(t, region_size=region_size, tissue_mask=t_mask)  
hta_stat, hta_pval = hta.calc()  
  
# plot heretogeneity map
hm = hta.plot_heterogeneity_map(trait_names, dot_size=8, is_clusters=True)  # (***)
hm.legend(loc='upper center', bbox_to_anchor=(0.5, -0.15), ncol=4, fontsize=9)  

# save heterogeneity map 
title = 'HTA {:.2f} (p-val: {:.2f}), region size: {}'.format(hta_stat, hta_pval, region_size)
font_dict = {'family': 'normal', 'size': 9}  
hm.title(title, fontdict=font_dict)  
hm.savefig('../out/{}_hetero_map.jpeg'.format('_'.join(trait_names)), dpi=350)  
hm.close()  
  
# save region report  
rr = hta.region_report(trait_names)  
rr.to_csv('../out/{}_region_report.csv'.format('_'.join(trait_names)))

[1] Assessing heterogeneity in spatial data using the HTA index with applications to spatial transcriptomics and imaging. Levy-Jurgenson et al.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.0

Jul 12, 2023

0.0.2

Jul 30, 2021

0.0.1

Jul 30, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hta-0.1.0.tar.gz (20.9 kB view details)

Uploaded Jul 12, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hta-0.1.0-py3-none-any.whl (16.4 kB view details)

Uploaded Jul 12, 2023 Python 3

File details

Details for the file hta-0.1.0.tar.gz.

File metadata

Download URL: hta-0.1.0.tar.gz
Upload date: Jul 12, 2023
Size: 20.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/6.8.0 pkginfo/1.9.6 requests/2.31.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for hta-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`06710a54015154e6c6db178264a76d29ff620c30a675618f2bbbed0673064761`
MD5	`b0b42d3f04bed658a7919f08ea16c6ad`
BLAKE2b-256	`71cb95b7501245f94bd211bfc061915dd613fa9523cef0e5dec4bdb892b5aa07`

See more details on using hashes here.

File details

Details for the file hta-0.1.0-py3-none-any.whl.

File metadata

Download URL: hta-0.1.0-py3-none-any.whl
Upload date: Jul 12, 2023
Size: 16.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/6.8.0 pkginfo/1.9.6 requests/2.31.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for hta-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`30ef6b863630457ecf2cec6c3de5f1c0d76bf94ef7cff393ad676dd0568b8688`
MD5	`586d67d2523c48cde1b0216f900737f0`
BLAKE2b-256	`ca7f5c52a6aebde3aae2b79af6741978c75de901848a307a933496e090aaa66e`

See more details on using hashes here.

hta 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Welcome to the HTA package!

Installation

Support and bug reports

Example 1 - Synthetic Data

Input format:

HTA:

Heterogeneity maps:

Region report:

Example 2 - Visium data

Example 3 - Visium with cluster id per barcode

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes