p-SNE: Poisson Stochastic Neighbor Embedding. Nonlinear dimensionality reduction for sparse count data (neural spike counts, scRNA-seq, text corpora).
Project description
p-SNE: Poisson Stochastic Neighbor Embedding
A nonlinear dimensionality reduction method for sparse count data.
p-SNE embeds high-dimensional count matrices (neural spike counts, text corpora) into 2D or 3D, using Poisson KL divergence to measure pairwise dissimilarity and Hellinger distance to optimize the embedding. It follows the same API conventions as scikit-learn's t-SNE.
📄 Paper: Neighbor Embedding for High-Dimensional Sparse Poisson Data (arXiv 2604.16932)
💻 Code: github.com/NogaMudrik/PSNE-Poisson-Stochastic-Neighbor-Embedding
📝 Blog post: Life Is Too Short for Wrong Metrics
Why p-SNE?
Standard dimensionality reduction methods (t-SNE, UMAP, PCA) assume continuous, Gaussian-distributed features. When applied to sparse count data, they treat zeros as informative distances and ignore the mean-variance coupling inherent in Poisson observations. This leads to distorted embeddings where structure is lost or fabricated.
p-SNE replaces the Euclidean distance in t-SNE with a Poisson KL divergence that respects the discrete, non-negative nature of count data. On sparse neural recordings, text word counts, and single-cell RNA-seq data, p-SNE recovers cluster structure that t-SNE, UMAP, and PCA miss.
Installation
pip install p-sne
Or from source:
git clone https://github.com/NogaMudrik/PSNE-Poisson-Stochastic-Neighbor-Embedding.git
cd PSNE-Poisson-Stochastic-Neighbor-Embedding
pip install -r requirements.txt
Core dependencies: numpy, scipy, scikit-learn, matplotlib, seaborn.
Quick start
import numpy as np
from psne.psne_core import PSNE
X = np.random.poisson(5, size=(50, 30)).astype(float)
model = PSNE(n_components=2, max_iter=500, eta=100.0, verbose=True)
embedding = model.fit_transform(X)
With your own data:
import numpy as np
from psne.psne_core import PSNE
X = np.load('my_data.npy').astype(float)
assert np.all(X >= 0), 'p-SNE requires non-negative input'
model = PSNE(
n_components=3,
s_mode='weight_exp',
weight_exp=1.0,
eta=200.0,
max_iter=1000,
gamma=0.0,
use_momentum=True,
use_early_exaggeration=True,
verbose=True,
)
embedding = model.fit_transform(X)
Plotting:
import matplotlib.pyplot as plt
labels = np.load('my_labels.npy')
fig, ax = plt.subplots()
ax.scatter(embedding[:, 0], embedding[:, 1], c=labels, cmap='tab10', s=30)
ax.set_xlabel('$y_1$')
ax.set_ylabel('$y_2$')
plt.show()
For 3D:
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(embedding[:, 0], embedding[:, 1], embedding[:, 2], c=labels, cmap='tab10', s=30)
plt.show()
Method
- Poisson KL distance matrix. Asymmetric divergence between all sample pairs:
$$D_{ij} = \frac{1}{N}\sum_n \left[ x_{n,i} \log\frac{x_{n,i}+\epsilon}{x_{n,j}+\epsilon} + x_{n,j} - x_{n,i} \right]$$
- High-dimensional joint probabilities $S$: convert $D$ into a symmetric probability matrix via a global weight exponent or adaptive per-point perplexity.
- Low-dimensional joint probabilities $Q$: Cauchy kernel over the embedding coordinates, as in t-SNE.
- Hellinger cost: minimize $H(S, Q)$ instead of KL divergence.
- Optional group-lasso penalty: $\gamma \sum_n |y_n|_2$ promotes sparsity across embedding dimensions.
- Optimizer: gradient descent with momentum and early exaggeration.
Data format
- Shape: $(N, T)$ where $N$ is features (neurons, genes, words) and $T$ is samples (conditions, cells, documents).
- Type:
floatorintnumpy array. - Values: non-negative.
Samples are columns, features are rows. The output embedding has shape (T, n_components) with samples as rows. Remove all-zero samples before fitting.
Parameters
Model:
| Parameter | Default | Description |
|---|---|---|
n_components |
3 | Embedding dimensionality. |
s_mode |
'weight_exp' |
How to build $S$: 'weight_exp' (global) or 'perplexity' (adaptive). |
weight_exp |
1.0 | Weight exponent for s_mode='weight_exp'. Higher sharpens neighborhoods. |
perplexity |
30.0 | Target perplexity for s_mode='perplexity'. Must be < number of samples. |
epsilon |
1e-2 | Smoothing constant for Poisson KL. |
gamma |
0.0 | Group-lasso regularization weight ($\gamma > 0$ enforces sparsity). |
random_state |
42 | Random seed for initialization. |
Optimizer:
| Parameter | Default | Description |
|---|---|---|
eta |
200.0 | Learning rate. |
max_iter |
1000 | Maximum iterations. |
tol |
1e-8 | Convergence tolerance on cost change. |
use_momentum |
True | Enable momentum. |
momentum_alpha |
0.5 | Initial momentum coefficient. |
momentum_alpha_final |
0.8 | Final momentum coefficient. |
momentum_switch_iter |
250 | Iteration at which momentum switches. |
use_early_exaggeration |
True | Multiply $S$ by exaggeration_factor for the first iterations. |
exaggeration_factor |
12.0 | Exaggeration multiplier. |
exaggeration_iters |
250 | Number of exaggeration iterations. |
Attributes (after fitting)
| Attribute | Shape | Description |
|---|---|---|
embedding_ |
(n_components, T) |
Learned embedding. fit_transform returns the transpose. |
cost_history_ |
list | Total cost at each iteration. |
hellinger_history_ |
list | Hellinger distance at each iteration. |
D_ |
$(T, T)$ | Poisson KL distance matrix. |
S_ |
$(T, T)$ | High-dimensional joint probabilities. |
Q_ |
$(T, T)$ | Final low-dimensional joint probabilities. |
n_iter_ |
int | Number of iterations run. |
Demo
python psne_demo_nonlinear.py
Runs two synthetic datasets (3-group and 4-group XOR), compares p-SNE against baselines (t-SNE, UMAP, PCA, ZIFA, scVI, GLM-PCA, Poisson GPFA), and saves embedding plots, cost curves, and .npy files.
File structure
PSNE-Poisson-Stochastic-Neighbor-Embedding/
├── psne/
│ ├── __init__.py
│ ├── psne_core.py
│ ├── psne_config.py
│ └── psne_utils.py
├── psne_demo_nonlinear.py
├── pyproject.toml
├── requirements.txt
├── LICENSE
└── README.md
Citation
If you use p-SNE, please cite:
@article{mudrik2026neighbor,
title={Neighbor Embedding for High-Dimensional Sparse Poisson Data},
author={Mudrik, Noga and Charles, Adam S},
journal={arXiv preprint arXiv:2604.16932},
year={2026}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file psne_poisson_neighbor_python-0.1.0-py3-none-any.whl.
File metadata
- Download URL: psne_poisson_neighbor_python-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8de306e10fd50150e73aa013515a691c4dac0ea0107daab9431720bf9008112d
|
|
| MD5 |
c92b55dd71862d6445c25577d36aae34
|
|
| BLAKE2b-256 |
0f80115b0f02397b65d76a724cf24fbe99aae01058d3b594c0599dc726deb401
|