p-SNE: Poisson Stochastic Neighbor Embedding. Nonlinear dimensionality reduction for sparse count data (neural spike counts, scRNA-seq, text corpora).

These details have not been verified by PyPI

Project links

Project description

p-SNE: Poisson Stochastic Neighbor Embedding

A nonlinear dimensionality reduction method for sparse count data.

p-SNE embeds high-dimensional count matrices (neural spike counts, text corpora) into 2D or 3D, using Poisson KL divergence to measure pairwise dissimilarity and Hellinger distance to optimize the embedding. It follows the same API conventions as scikit-learn's t-SNE.

📄 Paper: Neighbor Embedding for High-Dimensional Sparse Poisson Data (arXiv 2604.16932)

💻 Code: github.com/NogaMudrik/PSNE-Poisson-Stochastic-Neighbor-Embedding

📝 Blog post: Life Is Too Short for Wrong Metrics

Why p-SNE?

Standard dimensionality reduction methods (t-SNE, UMAP, PCA) assume continuous, Gaussian-distributed features. When applied to sparse count data, they treat zeros as informative distances and ignore the mean-variance coupling inherent in Poisson observations. This leads to distorted embeddings where structure is lost or fabricated.

p-SNE replaces the Euclidean distance in t-SNE with a Poisson KL divergence that respects the discrete, non-negative nature of count data. On sparse neural recordings, text word counts, and single-cell RNA-seq data, p-SNE recovers cluster structure that t-SNE, UMAP, and PCA miss.

Installation

pip install p-sne

Or from source:

git clone https://github.com/NogaMudrik/PSNE-Poisson-Stochastic-Neighbor-Embedding.git
cd PSNE-Poisson-Stochastic-Neighbor-Embedding
pip install -r requirements.txt

Core dependencies: numpy, scipy, scikit-learn, matplotlib, seaborn.

Quick start

import numpy as np
from psne.psne_core import PSNE

X = np.random.poisson(5, size=(50, 30)).astype(float)
model = PSNE(n_components=2, max_iter=500, eta=100.0, verbose=True)
embedding = model.fit_transform(X)

With your own data:

import numpy as np
from psne.psne_core import PSNE

X = np.load('my_data.npy').astype(float)
assert np.all(X >= 0), 'p-SNE requires non-negative input'

model = PSNE(
    n_components=3,
    s_mode='weight_exp',
    weight_exp=1.0,
    eta=200.0,
    max_iter=1000,
    gamma=0.0,
    use_momentum=True,
    use_early_exaggeration=True,
    verbose=True,
)
embedding = model.fit_transform(X)

Plotting:

import matplotlib.pyplot as plt

labels = np.load('my_labels.npy')

fig, ax = plt.subplots()
ax.scatter(embedding[:, 0], embedding[:, 1], c=labels, cmap='tab10', s=30)
ax.set_xlabel('$y_1$')
ax.set_ylabel('$y_2$')
plt.show()

For 3D:

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(embedding[:, 0], embedding[:, 1], embedding[:, 2], c=labels, cmap='tab10', s=30)
plt.show()

Method

Poisson KL distance matrix. Asymmetric divergence between all sample pairs:

$$D_{ij} = \frac{1}{N}\sum_n \left[ x_{n,i} \log\frac{x_{n,i}+\epsilon}{x_{n,j}+\epsilon} + x_{n,j} - x_{n,i} \right]$$

High-dimensional joint probabilities $S$: convert $D$ into a symmetric probability matrix via a global weight exponent or adaptive per-point perplexity.
Low-dimensional joint probabilities $Q$: Cauchy kernel over the embedding coordinates, as in t-SNE.
Hellinger cost: minimize $H(S, Q)$ instead of KL divergence.
Optional group-lasso penalty: $\gamma \sum_n |y_n|_2$ promotes sparsity across embedding dimensions.
Optimizer: gradient descent with momentum and early exaggeration.

Data format

Shape: $(N, T)$ where $N$ is features (neurons, genes, words) and $T$ is samples (conditions, cells, documents).
Type: float or int numpy array.
Values: non-negative.

Samples are columns, features are rows. The output embedding has shape (T, n_components) with samples as rows. Remove all-zero samples before fitting.

Parameters

Model:

Parameter	Default	Description
`n_components`	3	Embedding dimensionality.
`s_mode`	`'weight_exp'`	How to build $S$: `'weight_exp'` (global) or `'perplexity'` (adaptive).
`weight_exp`	1.0	Weight exponent for `s_mode='weight_exp'`. Higher sharpens neighborhoods.
`perplexity`	30.0	Target perplexity for `s_mode='perplexity'`. Must be < number of samples.
`epsilon`	1e-2	Smoothing constant for Poisson KL.
`gamma`	0.0	Group-lasso regularization weight ($\gamma > 0$ enforces sparsity).
`random_state`	42	Random seed for initialization.

Optimizer:

Parameter	Default	Description
`eta`	200.0	Learning rate.
`max_iter`	1000	Maximum iterations.
`tol`	1e-8	Convergence tolerance on cost change.
`use_momentum`	True	Enable momentum.
`momentum_alpha`	0.5	Initial momentum coefficient.
`momentum_alpha_final`	0.8	Final momentum coefficient.
`momentum_switch_iter`	250	Iteration at which momentum switches.
`use_early_exaggeration`	True	Multiply $S$ by `exaggeration_factor` for the first iterations.
`exaggeration_factor`	12.0	Exaggeration multiplier.
`exaggeration_iters`	250	Number of exaggeration iterations.

Attributes (after fitting)

Attribute	Shape	Description
`embedding_`	`(n_components, T)`	Learned embedding. `fit_transform` returns the transpose.
`cost_history_`	list	Total cost at each iteration.
`hellinger_history_`	list	Hellinger distance at each iteration.
`D_`	$(T, T)$	Poisson KL distance matrix.
`S_`	$(T, T)$	High-dimensional joint probabilities.
`Q_`	$(T, T)$	Final low-dimensional joint probabilities.
`n_iter_`	int	Number of iterations run.

Demo

python psne_demo_nonlinear.py

Runs two synthetic datasets (3-group and 4-group XOR), compares p-SNE against baselines (t-SNE, UMAP, PCA, ZIFA, scVI, GLM-PCA, Poisson GPFA), and saves embedding plots, cost curves, and .npy files.

File structure

PSNE-Poisson-Stochastic-Neighbor-Embedding/
├── psne/
│   ├── __init__.py
│   ├── psne_core.py
│   ├── psne_config.py
│   └── psne_utils.py
├── psne_demo_nonlinear.py
├── pyproject.toml
├── requirements.txt
├── LICENSE
└── README.md

Citation

If you use p-SNE, please cite:

@article{mudrik2026neighbor,
  title={Neighbor Embedding for High-Dimensional Sparse Poisson Data},
  author={Mudrik, Noga and Charles, Adam S},
  journal={arXiv preprint arXiv:2604.16932},
  year={2026}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.2

May 23, 2026

This version

0.1.1

May 23, 2026

0.1.0

May 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

psne_poisson_neighbor_python-0.1.1-py3-none-any.whl (18.5 kB view details)

Uploaded May 23, 2026 Python 3

File details

Details for the file psne_poisson_neighbor_python-0.1.1-py3-none-any.whl.

File metadata

Download URL: psne_poisson_neighbor_python-0.1.1-py3-none-any.whl
Upload date: May 23, 2026
Size: 18.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.7.13

File hashes

Hashes for psne_poisson_neighbor_python-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`152553eda773f7ced715a6c6afaa9d3e7156f6e4b3ea6602f3e176e7a9b2a8fd`
MD5	`84ed648adb0283779bc7918e969eb143`
BLAKE2b-256	`1b20551596ea50777145002b839f93352fc037a3c76b47b637d517de78c661a1`

See more details on using hashes here.

psne-poisson-neighbor-python 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

p-SNE: Poisson Stochastic Neighbor Embedding

Why p-SNE?

Installation

Quick start

Method

Data format

Parameters

Attributes (after fitting)

Demo

File structure

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes