Topica: fast, all-purpose topic modeling for Python — a Rust core for LDA, STM, and more
Project description
Topica: fast, all-purpose topic modeling for Python
📖 Documentation: guides, a full API reference, worked examples, and a Publishing in a social science journal methodology track.
topica is a fast topic-modeling library for Python with more than a dozen models, built for social scientists who want to move from text data to publishable results in a single workflow. It brings together models and tools usually split across JVM software like MALLET and R packages like stm, and runs them on a parallel Rust core competitive with the standard implementations, with every fit reproducible from a fixed seed. Each model comes with the validation, covariate-effect, and reporting tools to meet the standards reviewers expect.
pip install topica # once published; pre-built abi3 wheels, no Rust toolchain needed
from topica import LDA
model = LDA(num_topics=2, seed=42)
model.fit([["cat", "dog", "fish"]] * 15 + [["planet", "star", "moon"]] * 15, iterations=1000)
for i, words in enumerate(model.top_words(5)):
print(f"Topic {i}:", " ".join(w for w, _ in words))
See the getting-started guide and the worked examples for end-to-end analyses.
Models
| Model | What it's for |
|---|---|
LDA |
Classic topics via fast collapsed-Gibbs (SparseLDA); optional multi-threaded and LightLDA alias samplers |
DMR |
Topics conditioned on document metadata (Dirichlet-multinomial regression) |
LabeledLDA |
Supervised topics tied to document labels |
CTM |
Correlated topics (logistic-normal) |
STM |
The Structural Topic Model: correlated topics with prevalence and content covariates |
SAGE |
Content-covariate topics: the same topic worded differently across groups |
HDP |
Nonparametric LDA that infers the number of topics |
DTM |
Dynamic topics that evolve across time slices |
SupervisedLDA |
Topics shaped to predict a per-document response |
PT / GSDMM |
Short-text models for tweets, survey answers, headlines |
SeededLDA / KeyATM |
Guided topics steered by seed words |
PA / HLDA |
Topic hierarchies (Pachinko, nested-CRP) |
Every model exposes the same shape: fit(docs, …), then topic_word (φ), doc_topic (θ), top_words(n), transform(new_docs), and save/load. The variational models (CTM/STM/SupervisedLDA/DTM) parallelize across cores while staying bit-for-bit deterministic. Full guide: the models.
Diagnostics & analysis
Model-agnostic: they work on any fitted model's topic_word/doc_topic:
- Quality:
coherence(u_mass,c_v,c_uci,c_npmi; computed in the Rust core),exclusivity,topic_diversity,quality_frontier - Labeling:
label_topics(prob / FREX / lift / score),frex,relevance,find_thoughts,topic_table,summary - Validation:
word_intrusion,document_intrusion,bootstrap_stability,search_k - Comparison:
fighting_words(weighted log-odds) for contrasting corpora stmtoolkit:estimate_effect(method of composition, cluster-robust SEs, GLM links),posterior_theta_samples,spline,interaction,one_hot,topic_correlation- Preprocessing:
tokenize,learn_phrases/apply_phrases,split_documents, theCorpusclass
See diagnostics and covariate effects.
Install from source
pip install maturin
git clone https://github.com/nealcaren/topica && cd topica
python -m venv .venv && source .venv/bin/activate
maturin develop --release --features python
Requires numpy >= 1.21. Use --release (the debug build is much slower).
Acknowledgements
Topica stands on a generation of open topic-modeling research and code. The LDA core binds David Mimno's RustMallet and reproduces MALLET's train output bit-for-bit; the other models are Rust ports or reimplementations, validated against their reference implementations:
- MALLET (McCallum): SparseLDA, DMR, hyperparameter optimization
- stm (Roberts, Stewart & Tingley): the Structural Topic Model,
estimateEffect,searchK, FREX, spectral initialization, method of composition - lda-c / ctm-c / dtm and hdp (Blei lab): the CTM, Dynamic Topic Model, and HDP samplers
- gensim: coherence measures and the
LdaSeqModelDTM reference - tomotopy (bab2min): API conventions (
summary, short-text models) - keyATM (Eshima, Imai & Sasaki): keyword-assisted topic models
- seededlda (Watanabe): seeded LDA
- LightLDA (Yuan et al.): the alias-table Metropolis-Hastings sampler
- GSDMM (Yin & Wang 2014): the movie-group-process mixture for short text
Underlying methods are credited to their authors in the documentation and the source. The SparseLDA scheme is Yao, Mimno & McCallum (KDD 2009).
License
Apache-2.0. Builds on RustMallet (Apache-2.0).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file topica-0.1.1.tar.gz.
File metadata
- Download URL: topica-0.1.1.tar.gz
- Upload date:
- Size: 4.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef5380a561327ca2929e7f1e067f60b2b11aed420ad46f60fc2319747ea24432
|
|
| MD5 |
370df6e1091eee95b5283905b1901753
|
|
| BLAKE2b-256 |
090ce7a1a4aa92815a73ea07c51781e74a8c69f7bd118dea1bdc10d2d1acba5b
|
Provenance
The following attestation bundles were made for topica-0.1.1.tar.gz:
Publisher:
CI.yml on nealcaren/topica
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
topica-0.1.1.tar.gz -
Subject digest:
ef5380a561327ca2929e7f1e067f60b2b11aed420ad46f60fc2319747ea24432 - Sigstore transparency entry: 1711957249
- Sigstore integration time:
-
Permalink:
nealcaren/topica@1f1fa3d39e695312265e14fe1528b6023336da16 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/nealcaren
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@1f1fa3d39e695312265e14fe1528b6023336da16 -
Trigger Event:
push
-
Statement type:
File details
Details for the file topica-0.1.1-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: topica-0.1.1-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e9c4c3eced6013529a8990937c3b9592a3fc7d4996d8b7eebb06a759ba2ad3f
|
|
| MD5 |
b3ff9cb5b58f4253026a61c90d4c0eb1
|
|
| BLAKE2b-256 |
7e2be15008886046239c9ecc8e2ebb0db469106049c45a39c4f64f31b2a6300d
|
Provenance
The following attestation bundles were made for topica-0.1.1-cp39-abi3-win_amd64.whl:
Publisher:
CI.yml on nealcaren/topica
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
topica-0.1.1-cp39-abi3-win_amd64.whl -
Subject digest:
8e9c4c3eced6013529a8990937c3b9592a3fc7d4996d8b7eebb06a759ba2ad3f - Sigstore transparency entry: 1711957296
- Sigstore integration time:
-
Permalink:
nealcaren/topica@1f1fa3d39e695312265e14fe1528b6023336da16 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/nealcaren
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@1f1fa3d39e695312265e14fe1528b6023336da16 -
Trigger Event:
push
-
Statement type:
File details
Details for the file topica-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: topica-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.6 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c68e38122a4eea980c900034695a27a0760f551bd9e62170815a807ab736cdab
|
|
| MD5 |
568004e9f932885f8da7e37ed7206880
|
|
| BLAKE2b-256 |
1463683349a562f211d01234ce8f1c2c12d41488720a9810993c4a3dfbb9b753
|
Provenance
The following attestation bundles were made for topica-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
CI.yml on nealcaren/topica
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
topica-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
c68e38122a4eea980c900034695a27a0760f551bd9e62170815a807ab736cdab - Sigstore transparency entry: 1711957265
- Sigstore integration time:
-
Permalink:
nealcaren/topica@1f1fa3d39e695312265e14fe1528b6023336da16 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/nealcaren
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@1f1fa3d39e695312265e14fe1528b6023336da16 -
Trigger Event:
push
-
Statement type:
File details
Details for the file topica-0.1.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: topica-0.1.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5109bf1a65b6bb558d401d67d75478187b6bd59e32ec01fe08d93cc8e0f2df0c
|
|
| MD5 |
36e3c1adbfaa5483b4d03c2aa2edb4cc
|
|
| BLAKE2b-256 |
3343132e1cea250b6acd4ab83f0e489bc95b3da83b114633635390a8ca14eaf6
|
Provenance
The following attestation bundles were made for topica-0.1.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
CI.yml on nealcaren/topica
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
topica-0.1.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
5109bf1a65b6bb558d401d67d75478187b6bd59e32ec01fe08d93cc8e0f2df0c - Sigstore transparency entry: 1711957323
- Sigstore integration time:
-
Permalink:
nealcaren/topica@1f1fa3d39e695312265e14fe1528b6023336da16 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/nealcaren
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@1f1fa3d39e695312265e14fe1528b6023336da16 -
Trigger Event:
push
-
Statement type:
File details
Details for the file topica-0.1.1-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: topica-0.1.1-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb0fff9d975075172e6979969849ba6bde8ffa4d122bf89e89fb06c2844db53f
|
|
| MD5 |
2722abe295f262c5db8fc06b67fbeb00
|
|
| BLAKE2b-256 |
bd74deb2e26d9cb5502f7000c177f9e6ec7fc2a39047bf34272c3bec5fd86a25
|
Provenance
The following attestation bundles were made for topica-0.1.1-cp39-abi3-macosx_11_0_arm64.whl:
Publisher:
CI.yml on nealcaren/topica
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
topica-0.1.1-cp39-abi3-macosx_11_0_arm64.whl -
Subject digest:
fb0fff9d975075172e6979969849ba6bde8ffa4d122bf89e89fb06c2844db53f - Sigstore transparency entry: 1711957285
- Sigstore integration time:
-
Permalink:
nealcaren/topica@1f1fa3d39e695312265e14fe1528b6023336da16 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/nealcaren
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@1f1fa3d39e695312265e14fe1528b6023336da16 -
Trigger Event:
push
-
Statement type:
File details
Details for the file topica-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: topica-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.9+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee9d1c0e38569e76f696cc6fb53e67f6bce9e4990290b87e227e6e0c3fe340c5
|
|
| MD5 |
98705f28ac14ea476762f847db00ec0c
|
|
| BLAKE2b-256 |
b235e16b642dc2c137b287b0229982a60f1a04a4a496405a4fa13da2203073bb
|
Provenance
The following attestation bundles were made for topica-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl:
Publisher:
CI.yml on nealcaren/topica
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
topica-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl -
Subject digest:
ee9d1c0e38569e76f696cc6fb53e67f6bce9e4990290b87e227e6e0c3fe340c5 - Sigstore transparency entry: 1711957343
- Sigstore integration time:
-
Permalink:
nealcaren/topica@1f1fa3d39e695312265e14fe1528b6023336da16 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/nealcaren
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@1f1fa3d39e695312265e14fe1528b6023336da16 -
Trigger Event:
push
-
Statement type: