Skip to main content

Blazing fast implementation of the Thema algorithm.

Project description

Pulsar

Rust-backed Python library for topological data analysis. Implements the Thema pipeline: imputation → scaling → PCA → Ball Mapper → Cosmic Graph.

Performance-critical algorithms are written in Rust (PyO3/maturin) and exposed as pulsar._pulsar. Python orchestrates the pipeline.

MCP!

Let the Agent Do the Math

Until now, topological data analysis required a Ph.D. in algebraic topology and a masochistic tolerance for parameter tuning.

By default, Pulsar exposes a rich Python API. This is great when you actually want to build custom pipelines, but when you just want to find out why your dataset is acting weird, writing boilerplate is tedious and unintuitive.

We get lots of complaints about it actually, with people asking things like:

Why is my cosmic graph a hairball? What the hell should my epsilon range be? Why did my PCA just drop 90% of the variance?

We hear you, but we're not convinced that writing a 50-line hyperparameter grid search is what you really want. You don't want to have to manually calculate k-NN distances every time you load a CSV. And I doubt you really want to stare at a raw NetworkX adjacency matrix either — you want answers. You want to point an LLM at your data and say, "Find the natural clusters and tell me why they exist."

The Pulsar MCP (Model Context Protocol) Server is our attempt to give you what you actually want, without any of the downsides of doing something stupid like guessing topological parameters.

[!NOTE] For more guides, workflows, and an end-to-end MCP example, see demos/penguins/README.md.

Setup

Don't overcomplicate this. Add the server to your Claude Desktop config (or Gemini CLI, or whatever you're using):

{
  "mcpServers": {
    "pulsar": {
      "command": "uv",
      "args": ["run", "pulsar-mcp"],
      "env": {}
    }
  }
}

Restart your client. Done.

General Overview

What follows from here is the exact workflow we designed to dogfood the pipeline. It covers every sensible step of a topological analysis, from geometry probing to statistical dossiers.

It's important you let the agent follow this exact sequence for a few reasons:

  1. We want the graph to actually have signal out of the box.
  2. Really just the first reason, that's the whole point of these tools.

Here is the exact loop the agent should run:

  1. Ingest the dataset to get a stable dataset_id handle.
  2. Create a calibrated config via create_config(dataset_id) — calibrates epsilon and PCA against the processed feature space.
  3. Sweep the topology using that config.
  4. Diagnose the graph to see if it's a giant useless blob or actually balanced. Use the metrics to decide what to adjust, then iterate via refine_config.
  5. Generate the dossier to explain the clusters in plain English.
  6. Compare clusters for academic-grade p-values.
  7. Export the labeled data.

Tool Fly-By

We didn't just wrap our Python functions in JSON schemas. We built Thick Tools—stateful, workflow-aware engines that pass configuration directly between each other so you don't have to watch the agent screw up file I/O.

  • create_config(dataset_id): The primary config generation tool. Analyzes k-NN distances and PCA variance in the processed feature space (after preprocessing + scaling) to produce a calibrated YAML config. Never let the agent guess parameters.
  • run_topological_sweep: Runs the heavy Rust pipeline. Takes inline YAML and returns structured JSON with metrics and experiment diff. Config persistence is opt-in via save_config=True.
  • diagnose_cosmic_graph: Returns pure graph metrics (density, components, weight quantiles). The agent interprets these to decide what to adjust — e.g., "hairball" means high density, "shattered" means too many small components.
  • generate_cluster_dossier: Returns structured JSON with per-cluster profiles (Z-scores, homogeneity, concentration) plus a Markdown summary. Includes clustering method metadata (method used, silhouette score).
  • compare_clusters_tool: Runs Welch's T-tests, KS-tests, and Cohen's d between two specific clusters. Because sometimes your boss wants a p-value.
  • export_labeled_data: Maps semantic names to the cluster IDs and dumps it to a CSV.

Pitfalls & Annoyances

We try to make things foolproof, but some of you goofballs are going to try to break it anyway. Here is what to avoid:

  • Don't let the agent write YAML files manually. The tools pass YAML strings directly in memory (suggested_params_yaml -> config_yaml). If you watch the agent try to use write_file to save a params.yaml before running the sweep, stop it. If you make the agent do unnecessary file I/O you belong in prison.
  • Don't skip the diagnosis step. If the graph is a giant hairball, your clusters will be garbage. Use diagnose_cosmic_graph to get metrics, then refine_config to adjust epsilon or PCA dimensions based on what the metrics tell you.
  • Handle non-numeric data appropriately. Pulsar is a geometric engine. It needs floats. characterize_dataset will automatically tell the agent which low-cardinality strings to one-hot encode and which high-cardinality strings to drop. Don't fight it.

Citation

If you use this package in your research, please cite:

@article{Gathrid2025,
  author  = {Gathrid, Sidney and Wayland, Jeremy and Wayland, Stuart and Deshmukh, Ranjit and Wu, Grace C.},
  title   = {Strategies to accelerate US coal power phase-out using contextual retirement vulnerabilities},
  journal = {Nature Energy},
  year    = {2025},
  volume  = {10},
  number  = {10},
  pages   = {1274--1288},
  month   = {October},
  doi     = {10.1038/s41560-025-01871-0},
  url     = {https://doi.org/10.1038/s41560-025-01871-0},
  issn    = {2058-7546}
}

Which introduced the original Thema algorithm.

Installation

Requires Rust and Python 3.10+.

uv sync
uv run maturin develop --release

Quick start

from pulsar import ThemaRS

model = ThemaRS("params.yaml").fit()

graph = model.cosmic_graph        # networkx.Graph with 'weight' edge attributes
adj   = model.weighted_adjacency  # np.ndarray, shape (n, n)
reps  = model.select_representatives()  # uses the configured default

Copy params.yaml.sample to params.yaml and edit it for your dataset.

Progress reporting

  • Stage weight constants for progress_callback live in pulsar.runtime.utils._STAGE_WEIGHTS; pulsar.runtime.utils._build_cumulative_fractions turns them into cumulative fractions (used by ThemaRS.fit).
  • _rayon_thread_override in pulsar.runtime.utils caps RAYON_NUM_THREADS for Rust-heavy stages when notebooks need stricter thread control.
  • For notebooks: use pulsar.runtime.progress.fit_with_progress(model, data) or fit_multi_with_progress(model, datasets) — renders a transient rich progress bar.

Demos

Demo scripts organized by domain under demos/:

Energy domain:

uv run python demos/energy/coal.py  # US Coal Plants (downloads dataset automatically)

EHR domain:

uv run python demos/ehr/physionet.py --synthetic             # Synthetic data mode
uv run python demos/ehr/physionet.py --data path/to/eicu.csv # Real eICU CSV data
uv run python demos/ehr/ecg_arrhythmia.py                    # ECG arrhythmia classification

LLM/MMLU domain:

jupyter notebook demos/mmlu/mmlu_topology_demo.ipynb

Configuration

Cosmic graph thresholding is automatic by default, and representative selection has a sensible default. Most users only need to configure data, preprocessing, and sweeps.

run:
  name: my_experiment
  data: path/to/data.csv # CSV or parquet

preprocessing:
  drop_columns: [id, timestamp]
  impute:
    age:
      method: sample_normal # fill_mean | fill_median | fill_mode |
      seed: 42 # sample_normal | sample_categorical
    category:
      method: sample_categorical
      seed: 7

sweep:
  pca:
    dimensions:
      values: [2, 3, 5]
    seed:
      values: [42, 7, 13]
  ball_mapper:
    epsilon:
      range: { min: 0.1, max: 1.5, steps: 8 } # or: values: [0.3, 0.5, 0.8]

Development

uv run maturin develop        # debug build
uv run maturin develop --release  # optimised build
uv run pytest tests/ -v

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thema_pulsar-0.2.2-cp313-cp313-macosx_11_0_arm64.whl (475.0 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

File details

Details for the file thema_pulsar-0.2.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

  • Download URL: thema_pulsar-0.2.2-cp313-cp313-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 475.0 kB
  • Tags: CPython 3.13, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for thema_pulsar-0.2.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 383fd3ce530c24873c41a485484120fdc1dca2efb11aad13b501bd37a8e33dfc
MD5 47f881876f91143397924baf62c3d10e
BLAKE2b-256 e8720f734655dc96c92021bf674510e733a7306d34f8ec987d54f0d8baef04ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page