Skip to main content

...

Project description

HistoSeg

PyPI Docs Publish to PyPI License: PolyForm Noncommercial 1.0.0

HistoSeg is a Python toolkit for spatial transcriptomics segmentation / geometry extraction.

The current focus is Pattern1 isoline (0.5) contour generation from cell clusters (e.g., 10x Xenium GraphClust output):

  • Pick a set of “target clusters” (Pattern1)
  • Fit a KNN regressor to estimate P(target) over space
  • Smooth the probability field
  • Extract a contour (isoline) at level = 0.5
  • Save contour vertices and a quick preview plot

Quick links

⚠️ License note

This project is distributed under the PolyForm Noncommercial 1.0.0 license. Academic and other noncommercial use is permitted. Any commercial use requires a separate commercial license from SPATHO AB. See LICENSE for the full terms.


Installation

Install from PyPI (recommended)

pip install -U histoseg

Install from source (for development)

git clone https://github.com/hutaobo/HistoSeg.git
cd HistoSeg
pip install -U pip
pip install -e .

Dependencies

The Pattern1 isoline workflow uses:

  • numpy, pandas
  • scipy
  • scikit-learn
  • matplotlib
  • a Parquet engine (pyarrow is recommended)

If you run into missing imports, install them explicitly:

pip install -U numpy pandas pyarrow scipy scikit-learn matplotlib

Optional:

  • Hugging Face downloader: pip install -U huggingface_hub

Tutorial: Pattern1 isoline (0.5)

What you need (inputs)

The isoline workflow expects the following files:

  1. clusters.csv

    • Typically from GraphClust: analysis/clustering/gene_expression_graphclust/clusters.csv
    • Must contain columns: Barcode, Cluster
  2. cells.parquet

    • A cell-level table with spatial coordinates (x/y-like columns)
    • Must contain at least:
      • coordinate columns (e.g. x/y or x_centroid/y_centroid)
      • an id column that can be aligned with clusters.csv:Barcode (the code tries several common column names)
  3. tissue_boundary.csv (optional but recommended if you enable synthetic background)

    • Must contain columns x,y or X,Y

What you get (outputs)

By default, the pipeline writes into out_dir:

  • params.json — all parameters + inferred join columns
  • pattern1_isoline_<level>_<i>.npy — contour vertices (Nx2 arrays)
  • pattern1_isoline_<level>.png — quick preview plot

Quickstart

One-liner (from a Hugging Face dataset repo)

This follows the example notebook in examples/contour_generation_pattern1_from_hf.ipynb.

# pip install -U histoseg
# pip install -U huggingface_hub pandas pyarrow numpy scipy scikit-learn matplotlib

from histoseg import run_pattern1_isoline_from_hf

PATTERN1 = (10, 23, 19, 27, 14, 20, 25, 26)

result = run_pattern1_isoline_from_hf(
    repo_id="hutaobo/output-XETG00082_C105",
    revision="main",  # or a commit hash for strict reproducibility
    out_dir="outputs/pattern1_isoline0p5_from_graphclust",
    pattern1_clusters=PATTERN1,

    # Defaults are intentionally exposed for tuning:
    grid_n=1200,
    knn_k=30,
    smooth_sigma=5.0,
    min_cells_inside=10,
)

print("Outputs folder:", result.out_dir)
print("Preview image:", result.preview_png)
print("Contours:", len(result.contours))

Run on local files

from histoseg import Pattern1IsolineConfig, run_pattern1_isoline

PATTERN1 = (10, 23, 19, 27, 14, 20, 25, 26)

cfg = Pattern1IsolineConfig(
    clusters_csv="/path/to/analysis/clustering/gene_expression_graphclust/clusters.csv",
    cells_parquet="/path/to/cells.parquet",
    tissue_boundary_csv="/path/to/tissue_boundary.csv",
    out_dir="outputs/pattern1_isoline0p5",
    pattern1_clusters=PATTERN1,

    # Optional tuning:
    grid_n=1200,
    knn_k=30,
    smooth_sigma=5.0,
    min_cells_inside=10,
)

result = run_pattern1_isoline(cfg)
print(result)

How it works (workflow overview)

flowchart TD
  A["clusters.csv<br/>Barcode/Cluster"] --> C["Align barcodes<br/>with cells.parquet"]
  B["cells.parquet<br/>x/y + id-like column"] --> C
  C --> D["Select target clusters<br/>(Pattern1)"]
  D --> E["Sample background points<br/>(other cells)"]
  F["tissue_boundary.csv"] --> G["Generate synthetic background<br/>(optional)"]
  G --> E
  D --> H["KNN regression<br/>predict P(target)"]
  E --> H
  H --> I["Predict on mesh grid"]
  I --> J["Gaussian smoothing"]
  J --> K["Mask by tissue<br/>(nearest-cell threshold)"]
  K --> L["Extract isoline<br/>level = 0.5"]
  L --> M["Filter loops<br/>min_cells_inside"]
  M --> N["Save params.json<br/>+ contours .npy<br/>+ preview .png"]

Troubleshooting & tuning

If no contour is found, try:

  • Decrease min_cells_inside (e.g. 10 → 3)
  • Increase smooth_sigma (e.g. 5 → 8)
  • Increase knn_k (e.g. 30 → 50)
  • Reduce grid_n to speed up (note: grid_n=1200 can be heavy)

API reference (high-level)

Pattern1 isoline

  • Pattern1IsolineConfig
    Dataclass holding all parameters and input paths.

  • run_pattern1_isoline(cfg) -> Pattern1IsolineResult
    Runs the full pipeline on local files.

  • run_pattern1_isoline_from_hf(repo_id, revision="main", ...) -> Pattern1IsolineResult
    Convenience wrapper that downloads required files from a Hugging Face dataset repo and then runs the pipeline.

Hugging Face I/O helpers

  • download_xenium_outs(repo_id, revision="main", clusters_relpath=..., cache_dir=None)
    Downloads cells.parquet, tissue_boundary.csv, and the specified clusters.csv from a dataset repo.

SFPlot utilities (legacy / optional)

This repository contains a small subset of SFPlot-style utilities and re-exports:

  • compute_cophenetic_distances_from_df(df, ...)
  • plot_cophenetic_heatmap(matrix, ...)

GUI (experimental)

A GUI entry point is configured as:

histoseg-gui

Notes:

  • The current GUI code path is still in flux and may require extra dependencies (e.g., Pillow) and/or an external sfplot installation.
  • For production workflows, prefer the Python API shown above.

Contributing

Issues and pull requests are welcome.

When reporting a bug, please include:

  • OS + Python version
  • histoseg version
  • Minimal reproducible code (or a small input subset)
  • Expected vs. actual behavior

License

This project is distributed under the PolyForm Noncommercial 1.0.0 license. Noncommercial use (including academic research) is permitted. Any commercial use requires a separate commercial license from SPATHO AB. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

histoseg-0.1.7.tar.gz (6.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

histoseg-0.1.7-py3-none-any.whl (26.4 kB view details)

Uploaded Python 3

File details

Details for the file histoseg-0.1.7.tar.gz.

File metadata

  • Download URL: histoseg-0.1.7.tar.gz
  • Upload date:
  • Size: 6.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for histoseg-0.1.7.tar.gz
Algorithm Hash digest
SHA256 48ae58bde78fa64cd990577d7caaa7db4851faf192bc157fe94958d26595f454
MD5 d9bbe33dd1f8d2ed36c1f12160e0e71d
BLAKE2b-256 81d919d5d7b98ca61d183d012f9bb0710e519cbf3c3059b94cb23246e9e3a1b1

See more details on using hashes here.

Provenance

The following attestation bundles were made for histoseg-0.1.7.tar.gz:

Publisher: publish.yml on hutaobo/HistoSeg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file histoseg-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: histoseg-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for histoseg-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 4216f55627737eb28480b6420b2715cce819b4c7df414522bb24a1ec159e8a02
MD5 f16a5675474462e39964d956c2c80804
BLAKE2b-256 b6cbf0334925e1355f0abaac8c071f9612e555db7eff54ffe638d5c9528d4e89

See more details on using hashes here.

Provenance

The following attestation bundles were made for histoseg-0.1.7-py3-none-any.whl:

Publisher: publish.yml on hutaobo/HistoSeg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page