Skip to main content

Topica: fast, all-purpose topic modeling for Python — a Rust core for LDA, STM, and more

Project description

Topica: fast, all-purpose topic modeling for Python

📖 Documentation: guides, a full API reference, worked examples, and a Publishing in a social science journal methodology track.

topica is a fast topic-modeling library for Python with more than a dozen models, built for social scientists who want to move from text data to publishable results in a single workflow. It brings together models and tools usually split across JVM software like MALLET and R packages like stm, and runs them on a parallel Rust core competitive with the standard implementations, with every fit reproducible from a fixed seed. Each model comes with the validation, covariate-effect, and reporting tools to meet the standards reviewers expect.

pip install topica            # once published; pre-built abi3 wheels, no Rust toolchain needed
from topica import LDA

model = LDA(num_topics=2, seed=42)
model.fit([["cat", "dog", "fish"]] * 15 + [["planet", "star", "moon"]] * 15, iterations=1000)

for i, words in enumerate(model.top_words(5)):
    print(f"Topic {i}:", " ".join(w for w, _ in words))

See the getting-started guide and the worked examples for end-to-end analyses.

Models

Model What it's for
LDA Classic topics via fast collapsed-Gibbs (SparseLDA); optional multi-threaded and LightLDA alias samplers
DMR Topics conditioned on document metadata (Dirichlet-multinomial regression)
LabeledLDA Supervised topics tied to document labels
CTM Correlated topics (logistic-normal)
STM The Structural Topic Model: correlated topics with prevalence and content covariates
SAGE Content-covariate topics: the same topic worded differently across groups
HDP Nonparametric LDA that infers the number of topics
DTM Dynamic topics that evolve across time slices
SupervisedLDA Topics shaped to predict a per-document response
PT / GSDMM Short-text models for tweets, survey answers, headlines
SeededLDA / KeyATM Guided topics steered by seed words
PA / HLDA Topic hierarchies (Pachinko, nested-CRP)

Every model exposes the same shape: fit(docs, …), then topic_word (φ), doc_topic (θ), top_words(n), transform(new_docs), and save/load. The variational models (CTM/STM/SupervisedLDA/DTM) parallelize across cores while staying bit-for-bit deterministic. Full guide: the models.

Diagnostics & analysis

Model-agnostic: they work on any fitted model's topic_word/doc_topic:

  • Quality: coherence (u_mass, c_v, c_uci, c_npmi; computed in the Rust core), exclusivity, topic_diversity, quality_frontier
  • Labeling: label_topics (prob / FREX / lift / score), frex, relevance, find_thoughts, topic_table, summary
  • Validation: word_intrusion, document_intrusion, bootstrap_stability, search_k
  • Comparison: fighting_words (weighted log-odds) for contrasting corpora
  • stm toolkit: estimate_effect (method of composition, cluster-robust SEs, GLM links), posterior_theta_samples, spline, interaction, one_hot, topic_correlation
  • Preprocessing: tokenize, learn_phrases / apply_phrases, split_documents, the Corpus class

See diagnostics and covariate effects.

Install from source

pip install maturin
git clone https://github.com/nealcaren/topica && cd topica
python -m venv .venv && source .venv/bin/activate
maturin develop --release --features python

Requires numpy >= 1.21. Use --release (the debug build is much slower).

Acknowledgements

Topica stands on a generation of open topic-modeling research and code. The LDA core binds David Mimno's RustMallet and reproduces MALLET's train output bit-for-bit; the other models are Rust ports or reimplementations, validated against their reference implementations:

  • MALLET (McCallum): SparseLDA, DMR, hyperparameter optimization
  • stm (Roberts, Stewart & Tingley): the Structural Topic Model, estimateEffect, searchK, FREX, spectral initialization, method of composition
  • lda-c / ctm-c / dtm and hdp (Blei lab): the CTM, Dynamic Topic Model, and HDP samplers
  • gensim: coherence measures and the LdaSeqModel DTM reference
  • tomotopy (bab2min): API conventions (summary, short-text models)
  • keyATM (Eshima, Imai & Sasaki): keyword-assisted topic models
  • seededlda (Watanabe): seeded LDA
  • LightLDA (Yuan et al.): the alias-table Metropolis-Hastings sampler
  • GSDMM (Yin & Wang 2014): the movie-group-process mixture for short text

Underlying methods are credited to their authors in the documentation and the source. The SparseLDA scheme is Yao, Mimno & McCallum (KDD 2009).

License

Apache-2.0. Builds on RustMallet (Apache-2.0).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topica-0.1.1.tar.gz (4.4 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

topica-0.1.1-cp39-abi3-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.9+Windows x86-64

topica-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

topica-0.1.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.5 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

topica-0.1.1-cp39-abi3-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

topica-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file topica-0.1.1.tar.gz.

File metadata

  • Download URL: topica-0.1.1.tar.gz
  • Upload date:
  • Size: 4.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for topica-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ef5380a561327ca2929e7f1e067f60b2b11aed420ad46f60fc2319747ea24432
MD5 370df6e1091eee95b5283905b1901753
BLAKE2b-256 090ce7a1a4aa92815a73ea07c51781e74a8c69f7bd118dea1bdc10d2d1acba5b

See more details on using hashes here.

Provenance

The following attestation bundles were made for topica-0.1.1.tar.gz:

Publisher: CI.yml on nealcaren/topica

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file topica-0.1.1-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: topica-0.1.1-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for topica-0.1.1-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 8e9c4c3eced6013529a8990937c3b9592a3fc7d4996d8b7eebb06a759ba2ad3f
MD5 b3ff9cb5b58f4253026a61c90d4c0eb1
BLAKE2b-256 7e2be15008886046239c9ecc8e2ebb0db469106049c45a39c4f64f31b2a6300d

See more details on using hashes here.

Provenance

The following attestation bundles were made for topica-0.1.1-cp39-abi3-win_amd64.whl:

Publisher: CI.yml on nealcaren/topica

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file topica-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for topica-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c68e38122a4eea980c900034695a27a0760f551bd9e62170815a807ab736cdab
MD5 568004e9f932885f8da7e37ed7206880
BLAKE2b-256 1463683349a562f211d01234ce8f1c2c12d41488720a9810993c4a3dfbb9b753

See more details on using hashes here.

Provenance

The following attestation bundles were made for topica-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: CI.yml on nealcaren/topica

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file topica-0.1.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for topica-0.1.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5109bf1a65b6bb558d401d67d75478187b6bd59e32ec01fe08d93cc8e0f2df0c
MD5 36e3c1adbfaa5483b4d03c2aa2edb4cc
BLAKE2b-256 3343132e1cea250b6acd4ab83f0e489bc95b3da83b114633635390a8ca14eaf6

See more details on using hashes here.

Provenance

The following attestation bundles were made for topica-0.1.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: CI.yml on nealcaren/topica

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file topica-0.1.1-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for topica-0.1.1-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fb0fff9d975075172e6979969849ba6bde8ffa4d122bf89e89fb06c2844db53f
MD5 2722abe295f262c5db8fc06b67fbeb00
BLAKE2b-256 bd74deb2e26d9cb5502f7000c177f9e6ec7fc2a39047bf34272c3bec5fd86a25

See more details on using hashes here.

Provenance

The following attestation bundles were made for topica-0.1.1-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: CI.yml on nealcaren/topica

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file topica-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for topica-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ee9d1c0e38569e76f696cc6fb53e67f6bce9e4990290b87e227e6e0c3fe340c5
MD5 98705f28ac14ea476762f847db00ec0c
BLAKE2b-256 b235e16b642dc2c137b287b0229982a60f1a04a4a496405a4fa13da2203073bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for topica-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl:

Publisher: CI.yml on nealcaren/topica

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page