Skip to main content

Poisson topic modeling with Bayesian inference using JAX and NumPyro

Project description

poisson-topicmodels

poisson-topicmodels: Probabilistic Topic Modeling with Bayesian Inference

Python 3.11+ License: MIT PyPI version codecov Code style: black

poisson-topicmodels is a modern Python package for probabilistic topic modeling using Bayesian inference, built on JAX and NumPyro.

Package documentation

There is a full package documentation available here.

Statement of Need

Traditional topic modeling packages (e.g., Gensim, scikit-learn's LDA) use older inference methods and lack flexibility for emerging research needs. poisson-topicmodels addresses key gaps:

  1. Modern Probabilistic Inference: Built on NumPyro, enabling automatic differentiation, probabilistic programming, and integration with cutting-edge Bayesian methods.

  2. Advanced Topic Models: Goes beyond LDA with guided topic discovery (keyword priors), covariate effects, ideal point estimation, and embeddings—all with principled Bayesian inference.

  3. GPU Acceleration: Leverages JAX for transparent GPU computation, essential for large-scale corpus analysis and enabling research that would be prohibitively slow on CPU.

  4. Scalability & Reproducibility: Optimized for mini-batch SVI training with built-in seed control for exact reproducibility—critical for research validation and publication.

  5. Research-Friendly API: Purpose-built for computational social science and NLP researchers who need interpretable, flexible models beyond black-box approaches.

Whether analyzing legislative text, social media discourse, or scientific abstracts, poisson-topicmodels enables researchers to extract interpretable semantic structure with confidence in results.

Features

poisson-topicmodels provides multiple topic modeling approaches:

Model Use Case Key Feature
Poisson Factorization (PF) Unsupervised baseline Fast, interpretable word-topic associations
Seeded PF (SPF) Guided discovery Incorporate domain knowledge via keyword priors
Covariate PF (CPF) Covariate effects Model topics influenced by document metadata
Covariate Seeded PF (CSPF) Guided + covariates Combine keyword guidance with external factors
Text-Based Ideal Points (TBIP) Ideal point estimation Estimate author positions from legislative/social text
Embedded Topic Models (ETM) Modern embeddings Integrate pre-trained word embeddings

Core Capabilities:

  • ✨ Stochastic Variational Inference (SVI) with mini-batch training
  • ✨ Transparent GPU acceleration via JAX
  • ✨ Reproducible results with seed control
  • ✨ Type hints and comprehensive API documentation
  • ✨ >70% test coverage with continuous integration
  • ✨ Clear error messages and input validation

Quick Start

Get started in 5 minutes:

import numpy as np
from scipy.sparse import csr_matrix
from poisson_topicmodels import PF

# Prepare data: document-term matrix and vocabulary
counts = csr_matrix(np.random.poisson(2, (100, 500)).astype(np.float32))
vocab = np.array([f'word_{i}' for i in range(500)])

# Initialize and train model
model = PF(counts, vocab, num_topics=10, batch_size=32)
params = model.train_step(num_steps=100, lr=0.01, random_seed=42)

# Extract results
topics, _ = model.return_topics()
top_words = model.return_top_words_per_topic(n=10)
print(f"Found {topics.shape} topics")
print(f"Top words: {top_words[:3]}")

See examples/ directory for detailed notebooks.

Installation

From PyPI (recommended)

pip install poisson-topicmodels

From Source

git clone https://github.com/BPro2410/topicmodels_package.git
cd topicmodels_package
pip install -e .

Development Setup

git clone https://github.com/BPro2410/topicmodels_package.git
cd topicmodels_package
pip install -e ".[dev]"
pytest tests/  # Verify installation

Requirements

  • Python ≥ 3.11
  • JAX ≥ 0.4.35 (with optional GPU support)
  • NumPyro ≥ 0.15.3
  • NumPy, SciPy, scikit-learn, pandas

See pyproject.toml for complete dependency list.

Documentation

Basic Usage Examples

1. Unsupervised Topic Discovery (PF)

from poisson_topicmodels import PF

model = PF(counts, vocab, num_topics=10, batch_size=64)
model.train_step(num_steps=500, lr=0.001, random_seed=42)

# Extract topics
topics, topic_probs = model.return_topics()
top_words = model.return_top_words_per_topic(n=15)

2. Guided Topic Modeling with Keywords (SPF)

from poisson_topicmodels import SPF

keywords = {
    0: ['climate', 'environment', 'carbon'],
    1: ['economy', 'growth', 'trade'],
}

model = SPF(counts, vocab, keywords, residual_topics=5, batch_size=64)
model.train_step(num_steps=500, lr=0.001, random_seed=42)

3. Covariate Effects (CPF)

from poisson_topicmodels import CPF

# Include document-level covariates
covariates = np.random.randn(100, 3)  # 100 documents, 3 covariates

model = CPF(counts, vocab, covariates, num_topics=10, batch_size=64)
model.train_step(num_steps=500, lr=0.001, random_seed=42)

Custom Model Extension

Due to its modular structure it is easy to implement your own custom models with poisson-topicmodels. Below you can see a short example.

from poisson_topicmodels import NumpyroModel
import numpyro
from numpyro import plate, sample
import numpyro.distributions as dist

class MyModel(NumpyroModel):
    def _model(self, Y_batch, d_batch):
        with plate("n", len(Y_batch)):
            mu = sample("mu", dist.Normal(0, 1))
            sample("obs", dist.Normal(mu, 1), obs=Y_batch)

    def _guide(self, Y_batch, d_batch):
        mu_loc = numpyro.param("mu_loc", 0.0)
        mu_scale = numpyro.param("mu_scale", 1.0)
        with plate("n", len(Y_batch)):
            sample("mu", dist.Normal(mu_loc, mu_scale))

To implement a custom model, one has to only define the high-level model. The backbone of poisson-topicmodels handles training and inference.

Example Data

The repository includes data/10k_amazon.csv with ~10,000 Amazon product reviews for quick experimentation. See examples/01_getting_started.ipynb for a complete walkthrough.

Docker Setup (Optional)

For a reproducible, isolated environment with JupyterLab:

# Build image
docker build -t topicmodels-jupyter .

# Run container (Linux/macOS)
docker run --rm -p 8888:8888 -v "$(pwd)":/workspace topicmodels-jupyter

# Then open http://localhost:8888 in your browser

Citation

If you use poisson_topicmodels in your research, please cite:

@software{topicmodels2026,
  title = {Poisson-topicmodels: Probabilistic Topic Modeling with Bayesian Inference},
  author = {Prostmaier, Bernd and Grün, Bettina and Hofmarcher, Paul},
  year = {2026},
  url = {https://github.com/BPro2410/topicmodels_package},
}

See CITATION.cff for additional citation formats.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines on:

  • Reporting bugs
  • Submitting pull requests
  • Code style and testing requirements
  • Documentation standards

License

This project is licensed under the MIT License. See LICENSE for details.

Support


Built with ❤️ for researchers and practitioners in computational social science and NLP

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

poisson_topicmodels-0.1.2.tar.gz (23.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

poisson_topicmodels-0.1.2-py3-none-any.whl (32.3 kB view details)

Uploaded Python 3

File details

Details for the file poisson_topicmodels-0.1.2.tar.gz.

File metadata

  • Download URL: poisson_topicmodels-0.1.2.tar.gz
  • Upload date:
  • Size: 23.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for poisson_topicmodels-0.1.2.tar.gz
Algorithm Hash digest
SHA256 337bf2e470aed8d541d349615e63df33774dbb4e9641f5d0b67796f23dc65724
MD5 fc90d07635ec63e41a656562ac110ee2
BLAKE2b-256 17bd31d6f5f0500cb07170b26bbf4ee6ded4cefd7df314557bd504d51a75e499

See more details on using hashes here.

Provenance

The following attestation bundles were made for poisson_topicmodels-0.1.2.tar.gz:

Publisher: release.yaml on BPro2410/poisson_topicmodels

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file poisson_topicmodels-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for poisson_topicmodels-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 868014ce527db487dcaf818ea32e407ff1fbacedd60aaa4ed319dc407428a6af
MD5 52995a014b184868c688bc94b8bba417
BLAKE2b-256 3b0d1aa241e5f9cc52ee1cfa2d96f0cc414cb914dd0fc76a13c1a55a490a1f93

See more details on using hashes here.

Provenance

The following attestation bundles were made for poisson_topicmodels-0.1.2-py3-none-any.whl:

Publisher: release.yaml on BPro2410/poisson_topicmodels

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page