Poisson topic modeling with Bayesian inference using JAX and NumPyro
Project description
poisson-topicmodels: Probabilistic Topic Modeling with Bayesian Inference
topicmodels is a modern Python package for probabilistic topic modeling using Bayesian inference, built on JAX and NumPyro.
Statement of Need
Traditional topic modeling packages (e.g., Gensim, scikit-learn's LDA) use older inference methods and lack flexibility for emerging research needs. poisson-topicmodels addresses key gaps:
-
Modern Probabilistic Inference: Built on NumPyro, enabling automatic differentiation, probabilistic programming, and integration with cutting-edge Bayesian methods.
-
Advanced Topic Models: Goes beyond LDA with guided topic discovery (keyword priors), covariate effects, ideal point estimation, and embeddings—all with principled Bayesian inference.
-
GPU Acceleration: Leverages JAX for transparent GPU computation, essential for large-scale corpus analysis and enabling research that would be prohibitively slow on CPU.
-
Scalability & Reproducibility: Optimized for mini-batch SVI training with built-in seed control for exact reproducibility—critical for research validation and publication.
-
Research-Friendly API: Purpose-built for computational social science and NLP researchers who need interpretable, flexible models beyond black-box approaches.
Whether analyzing legislative text, social media discourse, or scientific abstracts, poisson-topicmodels enables researchers to extract interpretable semantic structure with confidence in results.
Features
poisson-topicmodels provides multiple topic modeling approaches:
| Model | Use Case | Key Feature |
|---|---|---|
| Poisson Factorization (PF) | Unsupervised baseline | Fast, interpretable word-topic associations |
| Seeded PF (SPF) | Guided discovery | Incorporate domain knowledge via keyword priors |
| Covariate PF (CPF) | Covariate effects | Model topics influenced by document metadata |
| Covariate Seeded PF (CSPF) | Guided + covariates | Combine keyword guidance with external factors |
| Text-Based Ideal Points (TBIP) | Ideal point estimation | Estimate author positions from legislative/social text |
| Embedded Topic Models (ETM) | Modern embeddings | Integrate pre-trained word embeddings |
Core Capabilities:
- ✨ Stochastic Variational Inference (SVI) with mini-batch training
- ✨ Transparent GPU acceleration via JAX
- ✨ Reproducible results with seed control
- ✨ Type hints and comprehensive API documentation
- ✨ >70% test coverage with continuous integration
- ✨ Clear error messages and input validation
Quick Start
Get started in 5 minutes:
import numpy as np
from scipy.sparse import csr_matrix
from poisson_topicmodels import PF
# Prepare data: document-term matrix and vocabulary
counts = csr_matrix(np.random.poisson(2, (100, 500)).astype(np.float32))
vocab = np.array([f'word_{i}' for i in range(500)])
# Initialize and train model
model = PF(counts, vocab, num_topics=10, batch_size=32)
params = model.train_step(num_steps=100, lr=0.01, random_seed=42)
# Extract results
topics, _ = model.return_topics()
top_words = model.return_top_words_per_topic(n=10)
print(f"Found {topics.shape[1]} topics")
print(f"Top words: {top_words[:3]}")
See examples/ directory for detailed notebooks.
Installation
From PyPI (recommended)
pip install poisson-topicmodels
From Source
git clone https://github.com/BPro2410/topicmodels_package.git
cd topicmodels_package
pip install -e .
Development Setup
git clone https://github.com/BPro2410/topicmodels_package.git
cd topicmodels_package
pip install -e ".[dev]"
pytest tests/ # Verify installation
Requirements
- Python ≥ 3.11
- JAX ≥ 0.4.35 (with optional GPU support)
- NumPyro ≥ 0.15.3
- NumPy, SciPy, scikit-learn, pandas
See pyproject.toml for complete dependency list.
Documentation
- API Reference – Complete model and method documentation
- User Guide – Detailed tutorials and workflows
- Examples – Jupyter notebooks demonstrating all features
- Contributing – How to contribute improvements
Basic Usage Examples
1. Unsupervised Topic Discovery (PF)
from poisson_topicmodels import PF
model = PF(counts, vocab, num_topics=10, batch_size=64)
model.train_step(num_steps=500, lr=0.001, random_seed=42)
# Extract topics
topics, topic_probs = model.return_topics()
top_words = model.return_top_words_per_topic(n=15)
2. Guided Topic Modeling with Keywords (SPF)
from poisson_topicmodels import SPF
keywords = {
0: ['climate', 'environment', 'carbon'],
1: ['economy', 'growth', 'trade'],
}
model = SPF(counts, vocab, keywords, residual_topics=5, batch_size=64)
model.train_step(num_steps=500, lr=0.001, random_seed=42)
3. Covariate Effects (CPF)
from poisson_topicmodels import CPF
# Include document-level covariates
covariates = np.random.randn(100, 3) # 100 documents, 3 covariates
model = CPF(counts, vocab, covariates, num_topics=10, batch_size=64)
model.train_step(num_steps=500, lr=0.001, random_seed=42)
Custom Model Extension
Due to its modular structure it is easy to implement your own custom models with poisson-topicmodels. Below you can see a short example.
from poisson_topicmodels import NumpyroModel
import numpyro
from numpyro import plate, sample
import numpyro.distributions as dist
class MyModel(NumpyroModel):
def _model(self, Y_batch, d_batch):
with plate("n", len(Y_batch)):
mu = sample("mu", dist.Normal(0, 1))
sample("obs", dist.Normal(mu, 1), obs=Y_batch)
def _guide(self, Y_batch, d_batch):
mu_loc = numpyro.param("mu_loc", 0.0)
mu_scale = numpyro.param("mu_scale", 1.0)
with plate("n", len(Y_batch)):
sample("mu", dist.Normal(mu_loc, mu_scale))
To implement a custom model, one has to only define the high-level model. The backbone of poisson-topicmodels handles training and inference.
Example Data
The repository includes data/10k_amazon.csv with ~10,000 Amazon product reviews for quick experimentation. See examples/01_getting_started.ipynb for a complete walkthrough.
Docker Setup (Optional)
For a reproducible, isolated environment with JupyterLab:
# Build image
docker build -t topicmodels-jupyter .
# Run container (Linux/macOS)
docker run --rm -p 8888:8888 -v "$(pwd)":/workspace topicmodels-jupyter
# Then open http://localhost:8888 in your browser
Citation
If you use poisson_topicmodels in your research, please cite:
@software{topicmodels2025,
title = {Poisson-topicmodels: Probabilistic Topic Modeling with Bayesian Inference},
author = {Prostmaier, Bernd and Grün, Bettina and Hofmarcher, Paul},
year = {2025},
url = {https://github.com/BPro2410/topicmodels_package},
}
See CITATION.cff for additional citation formats.
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines on:
- Reporting bugs
- Submitting pull requests
- Code style and testing requirements
- Documentation standards
License
This project is licensed under the MIT License. See LICENSE for details.
Support
- Issues & Bug Reports: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: ReadTheDocs
Built with ❤️ for researchers and practitioners in computational social science and NLP
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file poisson_topicmodels-0.1.0.tar.gz.
File metadata
- Download URL: poisson_topicmodels-0.1.0.tar.gz
- Upload date:
- Size: 23.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bfa82a09d02fcb2a6ada2a4f441089c32f6aebf7102b657ab19d375ab9f6fcc8
|
|
| MD5 |
b7302e4b84334371878691041681bc38
|
|
| BLAKE2b-256 |
7f8977f22be48db390459b81f80140059c6f92f611e120f9854a7c1a9cd411f5
|
Provenance
The following attestation bundles were made for poisson_topicmodels-0.1.0.tar.gz:
Publisher:
release.yaml on BPro2410/poisson_topicmodels
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
poisson_topicmodels-0.1.0.tar.gz -
Subject digest:
bfa82a09d02fcb2a6ada2a4f441089c32f6aebf7102b657ab19d375ab9f6fcc8 - Sigstore transparency entry: 920026902
- Sigstore integration time:
-
Permalink:
BPro2410/poisson_topicmodels@5b9c5b887fc2c61063223e5af35aea85e0525f40 -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/BPro2410
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@5b9c5b887fc2c61063223e5af35aea85e0525f40 -
Trigger Event:
push
-
Statement type:
File details
Details for the file poisson_topicmodels-0.1.0-py3-none-any.whl.
File metadata
- Download URL: poisson_topicmodels-0.1.0-py3-none-any.whl
- Upload date:
- Size: 32.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8bc37be12fbf48ce0256897b267271b96195d4fd5012173ce84f8b9c2b5e727
|
|
| MD5 |
b5d7d26373c1a46a5ac18b88c4c64228
|
|
| BLAKE2b-256 |
bf01dfe2f1913a3e97d9a7694fc9f1dab1d9ee17985fa88425e4a35e58c361fb
|
Provenance
The following attestation bundles were made for poisson_topicmodels-0.1.0-py3-none-any.whl:
Publisher:
release.yaml on BPro2410/poisson_topicmodels
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
poisson_topicmodels-0.1.0-py3-none-any.whl -
Subject digest:
a8bc37be12fbf48ce0256897b267271b96195d4fd5012173ce84f8b9c2b5e727 - Sigstore transparency entry: 920026903
- Sigstore integration time:
-
Permalink:
BPro2410/poisson_topicmodels@5b9c5b887fc2c61063223e5af35aea85e0525f40 -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/BPro2410
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@5b9c5b887fc2c61063223e5af35aea85e0525f40 -
Trigger Event:
push
-
Statement type: