Skip to main content

Poisson topic modeling with Bayesian inference using JAX and NumPyro

Project description

poisson-topicmodels

poisson-topicmodels: Probabilistic Topic Modeling with Bayesian Inference

Python 3.11+ License: MIT PyPI version codecov Code style: black

poisson-topicmodels is a modern Python package for probabilistic topic modeling using Bayesian inference, built on JAX and NumPyro.

Package documentation

There is a full package documentation available here.

Statement of Need

Traditional topic modeling packages (e.g., Gensim, scikit-learn's LDA) use older inference methods and lack flexibility for emerging research needs. poisson-topicmodels addresses key gaps:

  1. Modern Probabilistic Inference: Built on NumPyro, enabling automatic differentiation, probabilistic programming, and integration with cutting-edge Bayesian methods.

  2. Advanced Topic Models: Goes beyond LDA with guided topic discovery (keyword priors), covariate effects, ideal point estimation, and embeddings—all with principled Bayesian inference.

  3. GPU Acceleration: Leverages JAX for transparent GPU computation, essential for large-scale corpus analysis and enabling research that would be prohibitively slow on CPU.

  4. Scalability & Reproducibility: Optimized for mini-batch SVI training with built-in seed control for exact reproducibility—critical for research validation and publication.

  5. Research-Friendly API: Purpose-built for computational social science and NLP researchers who need interpretable, flexible models beyond black-box approaches.

Whether analyzing legislative text, social media discourse, or scientific abstracts, poisson-topicmodels enables researchers to extract interpretable semantic structure with confidence in results.

Features

poisson-topicmodels provides multiple topic modeling approaches:

Model Use Case Key Feature
Poisson Factorization (PF) Unsupervised baseline Fast, interpretable word-topic associations
Seeded PF (SPF) Guided discovery Incorporate domain knowledge via keyword priors
Covariate PF (CPF) Covariate effects Model topics influenced by document metadata
Covariate Seeded PF (CSPF) Guided + covariates Combine keyword guidance with external factors
Text-Based Ideal Points (TBIP) Ideal point estimation Estimate author positions from legislative/social text
Embedded Topic Models (ETM) Modern embeddings Integrate pre-trained word embeddings

Core Capabilities:

  • ✨ Stochastic Variational Inference (SVI) with mini-batch training
  • ✨ Transparent GPU acceleration via JAX
  • ✨ Reproducible results with seed control
  • ✨ Type hints and comprehensive API documentation
  • ✨ >70% test coverage with continuous integration
  • ✨ Clear error messages and input validation

Quick Start

Get started in 5 minutes:

import numpy as np
from scipy.sparse import csr_matrix
from poisson_topicmodels import PF

# Prepare data: document-term matrix and vocabulary
counts = csr_matrix(np.random.poisson(2, (100, 500)).astype(np.float32))
vocab = np.array([f'word_{i}' for i in range(500)])

# Initialize and train model
model = PF(counts, vocab, num_topics=10, batch_size=32)
params = model.train_step(num_steps=100, lr=0.01, random_seed=42)

# Extract results
topics, _ = model.return_topics()
top_words = model.return_top_words_per_topic(n=10)
print(f"Found {topics.shape} topics")
print(f"Top words: {top_words[:3]}")

See examples/ directory for detailed notebooks.

Installation

From PyPI (recommended)

pip install poisson-topicmodels

GPU installs (opt-in)

Automatic GPU detection at install time is not reliable across macOS/Windows/Linux/cloud runtimes. Use explicit install targets:

NVIDIA GPU (Linux x86_64/aarch64, CUDA 12)

pip install "poisson-topicmodels[gpu-cuda12]"

Apple Silicon GPU (Metal)

pip install "poisson-topicmodels[gpu-metal]"

Run with:

JAX_PLATFORMS=METAL python your_script.py

AMD GPU (ROCm)

Install the package first, then follow the official JAX AMD instructions: JAX AMD GPU install guide. JAX's AMD install uses ROCm plugin wheels and environment-specific commands, so it is not encoded as a generic PyPI extra.

Other GPU Installations

For other cases we refer to manual Jax installation. See the guide.

From Source

git clone https://github.com/BPro2410/poisson_topicmodels.git
cd poisson_topicmodels
pip install -e .

Development Setup

git clone https://github.com/BPro2410/poisson_topicmodels.git
cd poisson_topicmodels
pip install -e ".[dev]"
pytest tests/  # Verify installation

Requirements

  • Python ≥ 3.11
  • JAX 0.4.35 (GPU support via optional install targets above)
  • NumPyro ≥ 0.15.3
  • NumPy, SciPy, scikit-learn, pandas

See pyproject.toml for complete dependency list.

Documentation

Basic Usage Examples

1. Unsupervised Topic Discovery (PF)

from poisson_topicmodels import PF

model = PF(counts, vocab, num_topics=10, batch_size=64)
model.train_step(num_steps=500, lr=0.001, random_seed=42)

# Extract topics
topics, topic_probs = model.return_topics()
top_words = model.return_top_words_per_topic(n=15)

2. Guided Topic Modeling with Keywords (SPF)

from poisson_topicmodels import SPF

keywords = {
    0: ['climate', 'environment', 'carbon'],
    1: ['economy', 'growth', 'trade'],
}

model = SPF(counts, vocab, keywords, residual_topics=5, batch_size=64)
model.train_step(num_steps=500, lr=0.001, random_seed=42)

3. Covariate Effects (CPF)

from poisson_topicmodels import CPF

# Include document-level covariates
covariates = np.random.randn(100, 3)  # 100 documents, 3 covariates

model = CPF(counts, vocab, covariates, num_topics=10, batch_size=64)
model.train_step(num_steps=500, lr=0.001, random_seed=42)

Custom Model Extension

Due to its modular structure it is easy to implement your own custom models with poisson-topicmodels. Below you can see a short example.

from poisson_topicmodels import NumpyroModel
import numpyro
from numpyro import plate, sample
import numpyro.distributions as dist

class MyModel(NumpyroModel):
    def _model(self, Y_batch, d_batch):
        with plate("n", len(Y_batch)):
            mu = sample("mu", dist.Normal(0, 1))
            sample("obs", dist.Normal(mu, 1), obs=Y_batch)

    def _guide(self, Y_batch, d_batch):
        mu_loc = numpyro.param("mu_loc", 0.0)
        mu_scale = numpyro.param("mu_scale", 1.0)
        with plate("n", len(Y_batch)):
            sample("mu", dist.Normal(mu_loc, mu_scale))

To implement a custom model, one has to only define the high-level model. The backbone of poisson-topicmodels handles training and inference.

Example Data

The repository includes data/10k_amazon.csv with ~10,000 Amazon product reviews for quick experimentation. See examples/01_getting_started.py for a complete walkthrough.

Docker Setup (Optional)

For a reproducible, isolated environment with JupyterLab:

# Build image
docker build -t topicmodels-jupyter .

# Run container (Linux/macOS)
docker run --rm -p 8888:8888 -v "$(pwd)":/workspace topicmodels-jupyter

# Then open http://localhost:8888 in your browser

Citation

If you use poisson_topicmodels in your research, please cite:

@software{topicmodels2026,
  title = {Poisson-topicmodels: Probabilistic Topic Modeling with Bayesian Inference},
  author = {Prostmaier, Bernd and Grün, Bettina and Hofmarcher, Paul},
  year = {2026},
  url = {https://github.com/BPro2410/poisson_topicmodels},
}

See CITATION.cff for additional citation formats.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines on:

  • Reporting bugs
  • Submitting pull requests
  • Code style and testing requirements
  • Documentation standards

License

This project is licensed under the MIT License. See LICENSE for details.

Support


Built with ❤️ for researchers and practitioners in computational social science and NLP

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

poisson_topicmodels-0.2.0.tar.gz (47.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

poisson_topicmodels-0.2.0-py3-none-any.whl (55.9 kB view details)

Uploaded Python 3

File details

Details for the file poisson_topicmodels-0.2.0.tar.gz.

File metadata

  • Download URL: poisson_topicmodels-0.2.0.tar.gz
  • Upload date:
  • Size: 47.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for poisson_topicmodels-0.2.0.tar.gz
Algorithm Hash digest
SHA256 90f34dfd62a1e6684a7ddcab5692bfab4c2621272d17c9f498103107fab56f62
MD5 62dd916c0af4589909875dba4a7dedc0
BLAKE2b-256 bc8902c36b135b9ac43514bf3f1c003cd16fe0de8d8717511d275fa437f449fc

See more details on using hashes here.

Provenance

The following attestation bundles were made for poisson_topicmodels-0.2.0.tar.gz:

Publisher: release.yaml on BPro2410/poisson_topicmodels

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file poisson_topicmodels-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for poisson_topicmodels-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b420b66947e4cf587576dbf16d242d5954d6da67c28cf7964dc8bf8dafbaac9
MD5 93b1bc7ff44416bb71820064422cff48
BLAKE2b-256 074792e871f00a59f30662fe4743e6e6e169ed4da5c1bf3b72ac424c94a9d342

See more details on using hashes here.

Provenance

The following attestation bundles were made for poisson_topicmodels-0.2.0-py3-none-any.whl:

Publisher: release.yaml on BPro2410/poisson_topicmodels

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page