Skip to main content

...

Project description

HistoSeg

PyPI Docs Publish to PyPI License: PolyForm Noncommercial 1.0.0

HistoSeg is a Python toolkit for spatial transcriptomics segmentation / geometry extraction.

The current focus is Pattern1 isoline (0.5) contour generation from cell clusters (e.g., 10x Xenium GraphClust output):

  • Pick a set of “target clusters” (Pattern1)
  • Fit a KNN regressor to estimate P(target) over space
  • Smooth the probability field
  • Extract a contour (isoline) at level = 0.5
  • Save contour vertices and a quick preview plot

Quick links

⚠️ License note

This project is distributed under the PolyForm Noncommercial 1.0.0 license. Academic and other noncommercial use is permitted. Any commercial use requires a separate commercial license from SPATHO AB. See LICENSE for the full terms.


Installation

Install from PyPI (recommended)

pip install -U histoseg

Install from source (for development)

git clone https://github.com/hutaobo/HistoSeg.git
cd HistoSeg
pip install -U pip
pip install -e .

Dependencies

The Pattern1 isoline workflow uses:

  • numpy, pandas
  • scipy
  • scikit-learn
  • matplotlib
  • a Parquet engine (pyarrow is recommended)

If you run into missing imports, install them explicitly:

pip install -U numpy pandas pyarrow scipy scikit-learn matplotlib

Optional:

  • Hugging Face downloader: pip install -U huggingface_hub

Tutorial: Pattern1 isoline (0.5)

What you need (inputs)

The isoline workflow expects the following files:

  1. clusters.csv

    • Typically from GraphClust: analysis/clustering/gene_expression_graphclust/clusters.csv
    • Must contain columns: Barcode, Cluster
  2. cells.parquet

    • A cell-level table with spatial coordinates (x/y-like columns)
    • Must contain at least:
      • coordinate columns (e.g. x/y or x_centroid/y_centroid)
      • an id column that can be aligned with clusters.csv:Barcode (the code tries several common column names)
  3. tissue_boundary.csv (optional but recommended if you enable synthetic background)

    • Must contain columns x,y or X,Y

What you get (outputs)

By default, the pipeline writes into out_dir:

  • params.json — all parameters + inferred join columns
  • pattern1_isoline_<level>_<i>.npy — contour vertices (Nx2 arrays)
  • pattern1_isoline_<level>.png — quick preview plot

Quickstart

One-liner (from a Hugging Face dataset repo)

This follows the example notebook in examples/contour_generation_pattern1_from_hf.ipynb.

# pip install -U histoseg
# pip install -U huggingface_hub pandas pyarrow numpy scipy scikit-learn matplotlib

from histoseg import run_pattern1_isoline_from_hf

PATTERN1 = (10, 23, 19, 27, 14, 20, 25, 26)

result = run_pattern1_isoline_from_hf(
    repo_id="hutaobo/output-XETG00082_C105",
    revision="main",  # or a commit hash for strict reproducibility
    out_dir="outputs/pattern1_isoline0p5_from_graphclust",
    pattern1_clusters=PATTERN1,

    # Defaults are intentionally exposed for tuning:
    grid_n=1200,
    knn_k=30,
    smooth_sigma=5.0,
    min_cells_inside=10,
)

print("Outputs folder:", result.out_dir)
print("Preview image:", result.preview_png)
print("Contours:", len(result.contours))

Run on local files

from histoseg import Pattern1IsolineConfig, run_pattern1_isoline

PATTERN1 = (10, 23, 19, 27, 14, 20, 25, 26)

cfg = Pattern1IsolineConfig(
    clusters_csv="/path/to/analysis/clustering/gene_expression_graphclust/clusters.csv",
    cells_parquet="/path/to/cells.parquet",
    tissue_boundary_csv="/path/to/tissue_boundary.csv",
    out_dir="outputs/pattern1_isoline0p5",
    pattern1_clusters=PATTERN1,

    # Optional tuning:
    grid_n=1200,
    knn_k=30,
    smooth_sigma=5.0,
    min_cells_inside=10,
)

result = run_pattern1_isoline(cfg)
print(result)

How it works (workflow overview)

flowchart TD
  A["clusters.csv<br/>Barcode/Cluster"] --> C["Align barcodes<br/>with cells.parquet"]
  B["cells.parquet<br/>x/y + id-like column"] --> C
  C --> D["Select target clusters<br/>(Pattern1)"]
  D --> E["Sample background points<br/>(other cells)"]
  F["tissue_boundary.csv"] --> G["Generate synthetic background<br/>(optional)"]
  G --> E
  D --> H["KNN regression<br/>predict P(target)"]
  E --> H
  H --> I["Predict on mesh grid"]
  I --> J["Gaussian smoothing"]
  J --> K["Mask by tissue<br/>(nearest-cell threshold)"]
  K --> L["Extract isoline<br/>level = 0.5"]
  L --> M["Filter loops<br/>min_cells_inside"]
  M --> N["Save params.json<br/>+ contours .npy<br/>+ preview .png"]

Troubleshooting & tuning

If no contour is found, try:

  • Decrease min_cells_inside (e.g. 10 → 3)
  • Increase smooth_sigma (e.g. 5 → 8)
  • Increase knn_k (e.g. 30 → 50)
  • Reduce grid_n to speed up (note: grid_n=1200 can be heavy)

API reference (high-level)

Pattern1 isoline

  • Pattern1IsolineConfig
    Dataclass holding all parameters and input paths.

  • run_pattern1_isoline(cfg) -> Pattern1IsolineResult
    Runs the full pipeline on local files.

  • run_pattern1_isoline_from_hf(repo_id, revision="main", ...) -> Pattern1IsolineResult
    Convenience wrapper that downloads required files from a Hugging Face dataset repo and then runs the pipeline.

Hugging Face I/O helpers

  • download_xenium_outs(repo_id, revision="main", clusters_relpath=..., cache_dir=None)
    Downloads cells.parquet, tissue_boundary.csv, and the specified clusters.csv from a dataset repo.

SFPlot utilities (legacy / optional)

This repository contains a small subset of SFPlot-style utilities and re-exports:

  • compute_cophenetic_distances_from_df(df, ...)
  • plot_cophenetic_heatmap(matrix, ...)

GUI (experimental)

A GUI entry point is configured as:

histoseg-gui

Notes:

  • The current GUI code path is still in flux and may require extra dependencies (e.g., Pillow) and/or an external sfplot installation.
  • For production workflows, prefer the Python API shown above.

Contributing

Issues and pull requests are welcome.

When reporting a bug, please include:

  • OS + Python version
  • histoseg version
  • Minimal reproducible code (or a small input subset)
  • Expected vs. actual behavior

License

This project is distributed under the PolyForm Noncommercial 1.0.0 license. Noncommercial use (including academic research) is permitted. Any commercial use requires a separate commercial license from SPATHO AB. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

histoseg-0.1.8.1.tar.gz (5.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

histoseg-0.1.8.1-py3-none-any.whl (27.8 kB view details)

Uploaded Python 3

File details

Details for the file histoseg-0.1.8.1.tar.gz.

File metadata

  • Download URL: histoseg-0.1.8.1.tar.gz
  • Upload date:
  • Size: 5.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for histoseg-0.1.8.1.tar.gz
Algorithm Hash digest
SHA256 290aea06eba3bdccea54bf7366638625d79280a098dcfa5b4fabcf1986353c51
MD5 37e13de123dded107e324caad6af91c9
BLAKE2b-256 9337b3f93d18ad848d3f24939d648f5130e9868d3b7cdd194bcdd924f714f588

See more details on using hashes here.

Provenance

The following attestation bundles were made for histoseg-0.1.8.1.tar.gz:

Publisher: publish.yml on hutaobo/HistoSeg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file histoseg-0.1.8.1-py3-none-any.whl.

File metadata

  • Download URL: histoseg-0.1.8.1-py3-none-any.whl
  • Upload date:
  • Size: 27.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for histoseg-0.1.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 292e4481ec6df53e3623edab88f651e05ca6f2332a6fce066a678e4dd3e8dfe8
MD5 cbc2ac1aaa8509f63ad6734291cae397
BLAKE2b-256 31cc49fbf982b0cd5c2b0b766a5f23a717978768a410f22cf3b682613c6fdfc7

See more details on using hashes here.

Provenance

The following attestation bundles were made for histoseg-0.1.8.1-py3-none-any.whl:

Publisher: publish.yml on hutaobo/HistoSeg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page