Skip to main content

...

Project description

HistoSeg

PyPI Docs Publish to PyPI License: PolyForm Noncommercial 1.0.0

HistoSeg is a Python toolkit for spatial transcriptomics segmentation / geometry extraction.

The current focus is Pattern1 isoline (0.5) contour generation from cell clusters (e.g., 10x Xenium GraphClust output):

  • Pick a set of “target clusters” (Pattern1)
  • Fit a KNN regressor to estimate P(target) over space
  • Smooth the probability field
  • Extract a contour (isoline) at level = 0.5
  • Save contour vertices and a quick preview plot

Quick links

⚠️ License note

This project is distributed under the PolyForm Noncommercial 1.0.0 license. Academic and other noncommercial use is permitted. Any commercial use requires a separate commercial license from SPATHO AB. See LICENSE for the full terms.


Installation

Install from PyPI (recommended)

pip install -U histoseg

Install from source (for development)

git clone https://github.com/hutaobo/HistoSeg.git
cd HistoSeg
pip install -U pip
pip install -e .

Dependencies

The Pattern1 isoline workflow uses:

  • numpy, pandas
  • scipy
  • scikit-learn
  • matplotlib
  • a Parquet engine (pyarrow is recommended)

Optional:

  • Hugging Face downloader: pip install -U huggingface_hub

Tutorial: Pattern1 isoline (0.5)

What you need (inputs)

The isoline workflow expects the following files:

  1. clusters.csv

    • Typically from GraphClust: analysis/clustering/gene_expression_graphclust/clusters.csv
    • Must contain columns: Barcode, Cluster
  2. cells.parquet

    • A cell-level table with spatial coordinates (x/y-like columns)
    • Must contain at least:
      • coordinate columns (e.g. x/y or x_centroid/y_centroid)
      • an id column that can be aligned with clusters.csv:Barcode (the code tries several common column names)
  3. tissue_boundary.csv (optional but recommended if you enable synthetic background)

    • Must contain columns x,y or X,Y

What you get (outputs)

By default, the pipeline writes into out_dir:

  • params.json — all parameters + inferred join columns
  • pattern1_isoline_<level>_<i>.npy — contour vertices (Nx2 arrays)
  • pattern1_isoline_<level>.png — quick preview plot

Quickstart

One-liner (from a Hugging Face dataset repo)

This follows the example notebook in examples/contour_generation_pattern1_from_hf.ipynb.

# pip install -U histoseg
# pip install -U huggingface_hub pandas pyarrow numpy scipy scikit-learn matplotlib

from histoseg import run_pattern1_isoline_from_hf

PATTERN1 = (10, 23, 19, 27, 14, 20, 25, 26)

result = run_pattern1_isoline_from_hf(
    repo_id="hutaobo/output-XETG00082_C105",
    revision="main",  # or a commit hash for strict reproducibility
    out_dir="outputs/pattern1_isoline0p5_from_graphclust",
    pattern1_clusters=PATTERN1,

    # Defaults are intentionally exposed for tuning:
    grid_n=1200,
    knn_k=30,
    smooth_sigma=5.0,
    min_cells_inside=10,
)

print("Outputs folder:", result.out_dir)
print("Preview image:", result.preview_png)
print("Contours:", len(result.contours))

Run on local files

from histoseg import Pattern1IsolineConfig, run_pattern1_isoline

PATTERN1 = (10, 23, 19, 27, 14, 20, 25, 26)

cfg = Pattern1IsolineConfig(
    clusters_csv="/path/to/analysis/clustering/gene_expression_graphclust/clusters.csv",
    cells_parquet="/path/to/cells.parquet",
    tissue_boundary_csv="/path/to/tissue_boundary.csv",
    out_dir="outputs/pattern1_isoline0p5",
    pattern1_clusters=PATTERN1,

    # Optional tuning:
    grid_n=1200,
    knn_k=30,
    smooth_sigma=5.0,
    min_cells_inside=10,
)

result = run_pattern1_isoline(cfg)
print(result)

How it works (workflow overview)

flowchart TD
  A["clusters.csv<br/>Barcode/Cluster"] --> C["Align barcodes<br/>with cells.parquet"]
  B["cells.parquet<br/>x/y + id-like column"] --> C
  C --> D["Select target clusters<br/>(Pattern1)"]
  D --> E["Sample background points<br/>(other cells)"]
  F["tissue_boundary.csv"] --> G["Generate synthetic background<br/>(optional)"]
  G --> E
  D --> H["KNN regression<br/>predict P(target)"]
  E --> H
  H --> I["Predict on mesh grid"]
  I --> J["Gaussian smoothing"]
  J --> K["Mask by tissue<br/>(nearest-cell threshold)"]
  K --> L["Extract isoline<br/>level = 0.5"]
  L --> M["Filter loops<br/>min_cells_inside"]
  M --> N["Save params.json<br/>+ contours .npy<br/>+ preview .png"]

Troubleshooting & tuning

If no contour is found, try:

  • Decrease min_cells_inside (e.g. 10 → 3)
  • Increase smooth_sigma (e.g. 5 → 8)
  • Increase knn_k (e.g. 30 → 50)
  • Reduce grid_n to speed up (note: grid_n=1200 can be heavy)

API reference (high-level)

Pattern1 isoline

  • Pattern1IsolineConfig
    Dataclass holding all parameters and input paths.

  • run_pattern1_isoline(cfg) -> Pattern1IsolineResult
    Runs the full pipeline on local files.

  • run_pattern1_isoline_from_hf(repo_id, revision="main", ...) -> Pattern1IsolineResult
    Convenience wrapper that downloads required files from a Hugging Face dataset repo and then runs the pipeline.

Hugging Face I/O helpers

  • download_xenium_outs(repo_id, revision="main", clusters_relpath=..., cache_dir=None)
    Downloads cells.parquet, tissue_boundary.csv, and the specified clusters.csv from a dataset repo.

SFPlot utilities (legacy / optional)

This repository contains a small subset of SFPlot-style utilities and re-exports:

  • compute_cophenetic_distances_from_df(df, ...)
  • plot_cophenetic_heatmap(matrix, ...)

GUI (experimental)

A GUI entry point is configured as:

histoseg-gui

Notes:

  • The current GUI code path is still in flux and may require extra dependencies (e.g., Pillow) and/or an external sfplot installation.
  • For production workflows, prefer the Python API shown above.

Contributing

Issues and pull requests are welcome.

When reporting a bug, please include:

  • OS + Python version
  • histoseg version
  • Minimal reproducible code (or a small input subset)
  • Expected vs. actual behavior

License

This project is distributed under the PolyForm Noncommercial 1.0.0 license. Noncommercial use (including academic research) is permitted. Any commercial use requires a separate commercial license from SPATHO AB. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

histoseg-0.1.9.tar.gz (5.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

histoseg-0.1.9-py3-none-any.whl (30.1 kB view details)

Uploaded Python 3

File details

Details for the file histoseg-0.1.9.tar.gz.

File metadata

  • Download URL: histoseg-0.1.9.tar.gz
  • Upload date:
  • Size: 5.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for histoseg-0.1.9.tar.gz
Algorithm Hash digest
SHA256 77cb8b8d0e5f7e8defe94b5aeb87b45d7b6d8e357264feff4e8ec8f98e412193
MD5 a899a7199647d8e452a71dfb23ecc3a5
BLAKE2b-256 3bdf1fcbafe2a4389e8fda0385c305e97aee4e77d0114b5b7ccb0a195f69da4b

See more details on using hashes here.

Provenance

The following attestation bundles were made for histoseg-0.1.9.tar.gz:

Publisher: publish.yml on hutaobo/HistoSeg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file histoseg-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: histoseg-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 30.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for histoseg-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 05bece567dd0bbb01a0428256a9ee9551aa7cd38b5f731034a17802babd3bf8a
MD5 ccf24390d2b165d218fdd91060e75c24
BLAKE2b-256 3a5ff62838f491cd84fa6b8d47335b3879fce0990f28c3eaa7acbdae23651b3c

See more details on using hashes here.

Provenance

The following attestation bundles were made for histoseg-0.1.9-py3-none-any.whl:

Publisher: publish.yml on hutaobo/HistoSeg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page