Skip to main content

STHD: probabilistic cell typing of single Spots in whole Transcriptome spatial data with High Definition

Project description

STHD: probabilistic cell typing of Single spots in whole Transcriptome spatial data with High Definition

sthd_git1fig ---
  • Quick start: notebooks/tutorial.ipynb
  • Generates single-spot (2um) cell type labels and probabilities for VisiumHD data using a machine learning model.
  • Input: VisiumHD data and reference scRNA-seq dataset with cell type annotation.
  • Output: cell type labels and probabilities at 2um spot level.
  • Visualization - STHDviewer: interactive, scalable, and fast spatial plot of spot cell type labels, in a HTML.

Install


  • python version requirement: >=3.8.0
  • How to use
    • create new python venv python3.8 -m venv sthd_env
    • activate the venv source sthd_env/bin/activate
    • Install STHD from pip: pip install STHD
    • Or:
      • download repo: git clone git@github.com:yi-zhang/STHD.git
      • install dependencies: pip install -r STHD/requirements.txt
      • making sure ./STHD is in python path, e.g adding via sys.path.append('./STHD')
      • then in script: from STHD import {the module you need}

STHD Quickstart using a colon cancer VisiumHD patch:

  • See notebooks/tutorial.ipynb
  • The test data includes a patch crop from the VisiumHD file in testdata/crop10

STHD pipeline on a larger VisiumHD region, or the full VisiumHD sample:

Step 1: prepare normalized gene expression profile (lambda) by cell type from reference scRNA-seq data.

  • This step will generate the reference file. Details are in notebooks/s01_build_ref_scrna.ipynb
  • We provided the processed file ./testdata/crc_average_expr_genenorm_lambda_98ct_4618gs.txt

Step 2: pre-processing of VisiumHD data files

  • The test data includes a larger region from the VisiumHD file in testdata/crop10large/

Preparing the VisiumHD sample.

  • 10X Genomics colon cancer sample can be downloaded from: https://www.10xgenomics.com/datasets/visium-hd-cytassist-gene-expression-libraries-of-human-crc
  • Required input includes 2um level spatial expression: square_002um , which usually contains filtered_feature_bc_matrix.h5 and spatial/tissue_positions.csv . It is often from the downloaded folder "Binned outputs (all bin levels)". tissue positions in .parquet format can be converted using STHD/hdpp.py.
  • Required input also includes full-resolution H&E image: Visium_HD_Human_Colon_Cancer_tissue_image.btf. It is often from the "Microscope image".
  • The scale factor number will also be useful, which is usually in square_002um/spatial/scalefactors_json.json
  • Our processed data files are available as in: testdata/VisiumHD/

Step 3: Patchify the large region

  • This step will take a large region and split into patches. Details are in notebooks/s11_patchify.ipynb
  • Or, use example command line:
# Spliting patches from a test large cropped data:
python3 -m STHD.patchify \
--spatial_path ./testdata/crop10large/all_region/adata.h5ad.gzip \
--full_res_image_path ./testdata/crop10large/all_region/fullresimg_path.json \
--load_type crop \
--dx 1500 \
--dy 1500 \
--scale_factor 0.07973422 \
--refile ./testdata/crc_average_expr_genenorm_lambda_98ct_4618gs.txt \
--save_path ./testdata/crop10large_patchify \
--mode split
  • For full sample, example command line below (will take some space and time)
# Spliting patches from the full-size VisiumHD sample:
python3 -m STHD.patchify \
--spatial_path ./testdata/VisiumHD/square_002um/ \
--counts_data filtered_feature_bc_matrix.h5 \
--full_res_image_path ./testdata/VisiumHD/Visium_HD_Human_Colon_Cancer_tissue_image.btf \
--load_type original \
--dx 6000 \
--dy 6000 \
--scale_factor 0.07973422 \
--refile ./testdata/crc_average_expr_genenorm_lambda_98ct_4618gs.txt \
--save_path ./analysis/full_patchify \
--mode split

Step 4: Obtain training command line for the patch list

  • This step trains STHD on each patch. The command can be flexibly modified to submit to different slurm jobs on a HPC. Details are in notebooks/s12_per_patch_train.ipynb ,Or,
  • Example command is:
python3 -m STHD.train --refile ./testdata/crc_average_expr_genenorm_lambda_98ct_4618gs.txt \
--patch_list ./testdata/crop10large/patches/52979_9480 ./testdata/crop10large/patches/57479_9480 ./testdata/crop10large/patches/52979_7980 ./testdata/crop10large/patches/55979_7980 ./testdata/crop10large/patches/57479_7980 ./testdata/crop10large/patches/54479_9480 ./testdata/crop10large/patches/55979_9480 ./testdata/crop10large/patches/54479_7980

Step 5: Combine the patch results

  • This step combines STHD patch-wise results together. Details are in notebooks/s13_combine_patch.ipynb, Or
#Combine predictions
python3 -m STHD.patchify \
--refile ./testdata/crc_average_expr_genenorm_lambda_98ct_4618gs.txt \
--save_path ./testdata/crop10large_patchify \
--mode combine

Step 6: Visualize!

  • This step takes STHD results on a large region and generate STHDviewer for interactive exploration. Details are in notebooks/s21_visualize.ipynb

Step 7: Downstream analyses

  • One example is STHD-guided binning using a size of choice for —nspot
  • Details are in notebooks/s04_STHD_cell_type_guided_binning.ipynb; Or.
python -m STHD.binning_fast --patch_path ./testdata/crop10/ --nspot 4 --outfile ./testdata/crop10_STHDbin_nspot4.h5ad

Dependencies

requirements.txt

Reference

Sun C*, Yi Zhang*#. "STHD: probabilistic cell typing of single Spots in whole Transcriptome spatial data with High Definition". (2024) bioRxiv 2024.06.20.599803. Preprint link

Issues

Please contact Dr. Yi Zhang (yi.zhang[at]duke.edu) for desired addon features. You are welcome to follow our work by checking the Zhang lab at Duke!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sthd-1.0.1.tar.gz (43.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sthd-1.0.1-py3-none-any.whl (51.7 kB view details)

Uploaded Python 3

File details

Details for the file sthd-1.0.1.tar.gz.

File metadata

  • Download URL: sthd-1.0.1.tar.gz
  • Upload date:
  • Size: 43.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for sthd-1.0.1.tar.gz
Algorithm Hash digest
SHA256 52b6632f17c765f347ac3b0c6d311071a56206743c476a3ef2454514a1288a9a
MD5 8ead26d99632e5b680a508abd9cc3e10
BLAKE2b-256 56cb64941eb68f747773e4cb216a3129b2f13d8c3e0677222a7c20bb9c092785

See more details on using hashes here.

File details

Details for the file sthd-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: sthd-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 51.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for sthd-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 524a40d7cf6796ae87ed74e1429c5e540945d7a79154473cb2260bfc686f57a6
MD5 a7081d17202463902778ff817cddd0f4
BLAKE2b-256 a97c18c374c175f4d273e3f4d3d7b10c091929e360b9b45efc25dc1c2e1d22d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page