Blazing fast implementation of the Thema algorithm.
Project description
Pulsar
Rust-backed Python library for topological data analysis. Implements the Thema pipeline: imputation → scaling → PCA → Ball Mapper → Cosmic Graph.
Performance-critical algorithms are written in Rust (PyO3/maturin) and exposed as pulsar._pulsar. Python orchestrates the pipeline.
MCP!
Let the Agent Do the Math
Until now, topological data analysis required a Ph.D. in algebraic topology and a masochistic tolerance for parameter tuning.
By default, Pulsar exposes a rich Python API. This is great when you actually want to build custom pipelines, but when you just want to find out why your dataset is acting weird, writing boilerplate is tedious and unintuitive.
We get lots of complaints about it actually, with people asking things like:
Why is my cosmic graph a hairball? What the hell should my epsilon range be? Why did my PCA just drop 90% of the variance?
We hear you, but we're not convinced that writing a 50-line hyperparameter grid search is what you really want. You don't want to have to manually calculate k-NN distances every time you load a CSV. And I doubt you really want to stare at a raw NetworkX adjacency matrix either — you want answers. You want to point an LLM at your data and say, "Find the natural clusters and tell me why they exist."
The Pulsar MCP (Model Context Protocol) Server is our attempt to give you what you actually want, without any of the downsides of doing something stupid like guessing topological parameters.
Setup
Don't overcomplicate this. Add the server to your Claude Desktop config (or Gemini CLI, or whatever you're using):
{
"mcpServers": {
"pulsar": {
"command": "uvx",
"args": ["--from", "thema-pulsar[mcp]", "pulsar-mcp"],
"env": {}
}
}
}
This pulls thema-pulsar straight from PyPI — no clone, no uv sync required. If you prefer a persistent install, pipx install "thema-pulsar[mcp]" and use "command": "pulsar-mcp" instead.
Restart your client. Done.
General Overview
What follows from here is the exact workflow we designed to dogfood the pipeline. It covers every sensible step of a topological analysis, from geometry probing to statistical dossiers.
It's important you let the agent follow this exact sequence for a few reasons:
- We want the graph to actually have signal out of the box.
- Really just the first reason, that's the whole point of these tools.
Here is the exact loop the agent should run:
- Ingest the dataset to get a stable
dataset_idhandle. - Create a calibrated config via
create_config(dataset_id)— calibrates epsilon and PCA against the processed feature space. - Sweep the topology using that config.
- Diagnose the graph to see if it's a giant useless blob or actually balanced. Use the metrics to decide what to adjust, then iterate via
refine_config. - Generate the dossier to explain the clusters in plain English.
- Compare clusters for academic-grade p-values.
- Export the labeled data.
Tool Fly-By
We didn't just wrap our Python functions in JSON schemas. We built Thick Tools—stateful, workflow-aware engines that pass configuration directly between each other so you don't have to watch the agent screw up file I/O.
create_config(dataset_id): The primary config generation tool. Analyzes k-NN distances and PCA variance in the processed feature space (after preprocessing + scaling) to produce a calibrated YAML config. Never let the agent guess parameters.run_topological_sweep: Runs the heavy Rust pipeline. Takes inline YAML and returns structured JSON with metrics and experiment diff. Config persistence is opt-in viasave_config=True.diagnose_cosmic_graph: Returns pure graph metrics (density, components, weight quantiles). The agent interprets these to decide what to adjust — e.g., "hairball" means high density, "shattered" means too many small components.generate_cluster_dossier: Returns structured JSON with per-cluster profiles (Z-scores, homogeneity, concentration) plus a Markdown summary. Includes clustering method metadata (method used, silhouette score).compare_clusters_tool: Runs Welch's T-tests, KS-tests, and Cohen's d between two specific clusters. Because sometimes your boss wants a p-value.export_labeled_data: Maps semantic names to the cluster IDs and dumps it to a CSV.
Pitfalls & Annoyances
We try to make things foolproof, but some of you goofballs are going to try to break it anyway. Here is what to avoid:
- Don't let the agent write YAML files manually.
The tools pass YAML strings directly in memory (
suggested_params_yaml->config_yaml). If you watch the agent try to usewrite_fileto save aparams.yamlbefore running the sweep, stop it. If you make the agent do unnecessary file I/O you belong in prison. - Don't skip the diagnosis step.
If the graph is a giant hairball, your clusters will be garbage. Use
diagnose_cosmic_graphto get metrics, thenrefine_configto adjust epsilon or PCA dimensions based on what the metrics tell you. - Handle non-numeric data appropriately.
Pulsar is a geometric engine. It needs floats.
characterize_datasetwill automatically tell the agent which low-cardinality strings to one-hot encode and which high-cardinality strings to drop. Don't fight it.
[!NOTE]
For more guides, workflows, and an end-to-end MCP example, seedemos/penguins/README.md.
Citation
If you use this package in your research, please cite:
@article{Gathrid2025,
author = {Gathrid, Sidney and Wayland, Jeremy and Wayland, Stuart and Deshmukh, Ranjit and Wu, Grace C.},
title = {Strategies to accelerate US coal power phase-out using contextual retirement vulnerabilities},
journal = {Nature Energy},
year = {2025},
volume = {10},
number = {10},
pages = {1274--1288},
month = {October},
doi = {10.1038/s41560-025-01871-0},
url = {https://doi.org/10.1038/s41560-025-01871-0},
issn = {2058-7546}
}
Which introduced the original Thema algorithm.
Installation
Requires Rust and Python 3.10+.
uv sync
uv run maturin develop --release
Quick start
from pulsar import ThemaRS
model = ThemaRS("params.yaml").fit()
graph = model.cosmic_graph # networkx.Graph with 'weight' edge attributes
adj = model.weighted_adjacency # np.ndarray, shape (n, n)
reps = model.select_representatives() # uses the configured default
Copy params.yaml.sample to params.yaml and edit it for your dataset.
Progress reporting
- Stage weight constants for
progress_callbacklive inpulsar.runtime.utils._STAGE_WEIGHTS;pulsar.runtime.utils._build_cumulative_fractionsturns them into cumulative fractions (used byThemaRS.fit). _rayon_thread_overrideinpulsar.runtime.utilscapsRAYON_NUM_THREADSfor Rust-heavy stages when notebooks need stricter thread control.- For notebooks: use
pulsar.runtime.progress.fit_with_progress(model, data)orfit_multi_with_progress(model, datasets)— renders a transient rich progress bar.
Demos
Demo scripts organized by domain under demos/:
Energy domain:
uv run python demos/energy/coal.py # US Coal Plants (downloads dataset automatically)
EHR domain:
uv run python demos/ehr/physionet.py --synthetic # Synthetic data mode
uv run python demos/ehr/physionet.py --data path/to/eicu.csv # Real eICU CSV data
uv run python demos/ehr/ecg_arrhythmia.py # ECG arrhythmia classification
LLM/MMLU domain:
jupyter notebook demos/mmlu/mmlu_topology_demo.ipynb
Configuration
Cosmic graph thresholding is automatic by default, and representative selection has a sensible default. Most users only need to configure data, preprocessing, and sweeps.
run:
name: my_experiment
data: path/to/data.csv # CSV or parquet
preprocessing:
drop_columns: [id, timestamp]
impute:
age:
method: sample_normal # fill_mean | fill_median | fill_mode |
seed: 42 # sample_normal | sample_categorical
category:
method: sample_categorical
seed: 7
sweep:
pca:
dimensions:
values: [2, 3, 5]
seed:
values: [42, 7, 13]
ball_mapper:
epsilon:
range: { min: 0.1, max: 1.5, steps: 8 } # or: values: [0.3, 0.5, 0.8]
Development
uv run maturin develop # debug build
uv run maturin develop --release # optimised build
uv run pytest tests/ -v
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file thema_pulsar-0.2.3.tar.gz.
File metadata
- Download URL: thema_pulsar-0.2.3.tar.gz
- Upload date:
- Size: 3.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6080e6af23e1356c5041d6887eb31f2110a09a35cd496590efe23aaefb3f806
|
|
| MD5 |
4a21915b2429793798c93032ec78944d
|
|
| BLAKE2b-256 |
6f0282428418f045f4ce1b75875e9833ccf0cefb710974ed4b4992cf17cc1c3e
|
File details
Details for the file thema_pulsar-0.2.3-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: thema_pulsar-0.2.3-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 504.6 kB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10766f72ab7e64e86bf305611617471a5479beb17ce710732419f893296eb052
|
|
| MD5 |
d123377f661aa286c275616ff92716d3
|
|
| BLAKE2b-256 |
b92c8558331377cc89f6026f5a69c94a948238de3f1bb47aa9288c6568d32253
|