Fast LDA topic modeling — Python bindings for RustMallet
Project description
pyrmallet
Python bindings for RustMallet — a fast Rust implementation of the sparse Gibbs sampling LDA algorithm from MALLET, following the SparseLDA scheme of Yao, Mimno and McCallum (KDD 2009).
Built with PyO3 and maturin. There are two layers: a sklearn-compatible LatentDirichletAllocation class and a lower-level _rust_mallet extension module.
Install
pip install pyrmallet
sklearn-compatible API
LatentDirichletAllocation follows the scikit-learn estimator interface. It takes a list of raw text strings — tokenization and vocabulary building happen inside Rust.
from pyrmallet import LatentDirichletAllocation
docs = ["the quick brown fox ...", "machine learning models ...", ...]
lda = LatentDirichletAllocation(n_components=20, max_iter=1000)
lda.fit(docs)
lda.components_ # ndarray [n_topics, n_vocab], rows sum to ~1
lda.doc_topic_distributions_ # ndarray [n_docs, n_topics]
lda.feature_names_in_ # vocabulary array
lda.n_features_in_ # vocabulary size
fit_transform() is also available and returns doc_topic_distributions_ directly.
Inferring topic distributions for new documents
After fit(), call transform() with any list of raw text strings. Tokens not seen during training are silently ignored.
new_docs = ["natural language processing tasks ...", "deep reinforcement learning ..."]
theta = lda.transform(new_docs) # ndarray [n_new_docs, n_topics]
The number of Gibbs iterations used for inference is controlled by n_inference_iter (default 50).
Constructor parameters
| Parameter | Default | Description |
|---|---|---|
n_components |
10 | Number of topics |
max_iter |
1000 | Gibbs sampling iterations |
burn_in |
200 | Iterations before hyperparameter optimization |
optimize_interval |
50 | Optimize alpha/beta every N iterations; 0 to disable |
num_samples |
5 | Samples averaged for final estimates |
sample_interval |
25 | Iterations between samples |
doc_topic_prior |
n_components |
Initial symmetric alpha sum |
topic_word_prior |
0.01 | Initial beta per word |
random_state |
42 | Random seed |
n_inference_iter |
50 | Gibbs iterations per document during transform() |
stopwords |
None | List of words to exclude, or path to a stoplist file |
min_doc_freq |
1 | Drop words appearing in fewer than N documents |
max_doc_fraction |
1.0 | Drop words appearing in more than this fraction of documents |
verbose |
False | Print log-likelihood progress during training |
Low-level API
pyrmallet._rust_mallet exposes Corpus and TopicModel objects directly.
from pyrmallet import _rust_mallet as rm
# Build a corpus directly from strings (no file I/O)
stopwords = rm.load_stopwords("examples/english-stoplist.txt")
corpus = rm.Corpus.from_strings(
docs,
stopwords=stopwords,
min_doc_freq=2,
)
# Or load from a file
corpus = rm.Corpus.from_text_file("docs.txt", stopwords=stopwords)
corpus = rm.Corpus.from_tsv_file(
"docs.tsv", id_column=0, text_column=1,
stopwords=stopwords,
)
# Save/load a preprocessed corpus
corpus.save("corpus.corp")
corpus = rm.Corpus.load("corpus.corp")
# Train
model = rm.train(corpus, num_topics=20, iterations=1000, verbose=True)
# Inspect results
model.top_words(n=10) # List[List[str]], one word list per topic
model.topic_word_matrix() # List[List[float]], shape [num_topics][num_types]
model.doc_topic_matrix() # List[List[float]], shape [num_docs][num_topics]
model.log_likelihood(corpus)
# Infer topic distributions for new raw-text documents (fixed-phi Gibbs)
theta = model.infer_strings(new_docs, n_iter=50) # List[List[float]], shape [n_docs][num_topics]
# Or infer from a pre-built count matrix (columns indexed by training vocabulary)
theta = model.infer(count_matrix, n_iter=50) # List[List[float]]
Building from source
Requires uv and a Rust toolchain. From the repo root:
PATH="$HOME/.cargo/bin:$PATH" uv run --with maturin maturin develop
See the RustMallet README for the full project, including the standalone CLI tools.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyrmallet-0.1.1.tar.gz.
File metadata
- Download URL: pyrmallet-0.1.1.tar.gz
- Upload date:
- Size: 2.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8d7cf388f3890f0ecbb745b37e0585c71e773b1a75806ac665b718b7e8a6de1
|
|
| MD5 |
00ef79a6554bfffe603e6f535fbe893b
|
|
| BLAKE2b-256 |
93ef51b4e6db269aafe20de190c8ab09f222ecd4603b6c08ba5b2d96eb3b873d
|
Provenance
The following attestation bundles were made for pyrmallet-0.1.1.tar.gz:
Publisher:
pypi.yml on mimno/RustMallet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyrmallet-0.1.1.tar.gz -
Subject digest:
b8d7cf388f3890f0ecbb745b37e0585c71e773b1a75806ac665b718b7e8a6de1 - Sigstore transparency entry: 1854534796
- Sigstore integration time:
-
Permalink:
mimno/RustMallet@bc2c084d1de9763529904a41f340a040ab88f9fd -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/mimno
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@bc2c084d1de9763529904a41f340a040ab88f9fd -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyrmallet-0.1.1-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: pyrmallet-0.1.1-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 836.3 kB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
796c5e21b5a5e452e072203983eba22df9b624573f8fb3279acb495aa46c6ec1
|
|
| MD5 |
f4f85761a13baeb102ebf1dd78934c8a
|
|
| BLAKE2b-256 |
0e54ce38e827565a1dd5b645a596e32dacea0b418f1f127efaaec788a40365b4
|
Provenance
The following attestation bundles were made for pyrmallet-0.1.1-cp39-abi3-win_amd64.whl:
Publisher:
pypi.yml on mimno/RustMallet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyrmallet-0.1.1-cp39-abi3-win_amd64.whl -
Subject digest:
796c5e21b5a5e452e072203983eba22df9b624573f8fb3279acb495aa46c6ec1 - Sigstore transparency entry: 1854534829
- Sigstore integration time:
-
Permalink:
mimno/RustMallet@bc2c084d1de9763529904a41f340a040ab88f9fd -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/mimno
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@bc2c084d1de9763529904a41f340a040ab88f9fd -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyrmallet-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: pyrmallet-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84435bc7b23487815df963e6a612746f81d617192369b70fc34f6dc756f03471
|
|
| MD5 |
85cba2025d15cfc7db9c24d847e51805
|
|
| BLAKE2b-256 |
c95612420da916acb2e30e8d7d87ad784d07e8460f9472f5e8d3d1e89dbd9125
|
Provenance
The following attestation bundles were made for pyrmallet-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
pypi.yml on mimno/RustMallet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyrmallet-0.1.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
84435bc7b23487815df963e6a612746f81d617192369b70fc34f6dc756f03471 - Sigstore transparency entry: 1854534818
- Sigstore integration time:
-
Permalink:
mimno/RustMallet@bc2c084d1de9763529904a41f340a040ab88f9fd -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/mimno
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@bc2c084d1de9763529904a41f340a040ab88f9fd -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyrmallet-0.1.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: pyrmallet-0.1.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.1 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e788ae437abf105b0a6eb79c1869143157b5c9c6a3f83e33c70ecb161212c72
|
|
| MD5 |
27ffae8f421826cc43596175aae6326f
|
|
| BLAKE2b-256 |
28734b652f1b8f841e5ad3751a27d5f749ac37b600a4c2e0d6e06c95280f121b
|
Provenance
The following attestation bundles were made for pyrmallet-0.1.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
pypi.yml on mimno/RustMallet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyrmallet-0.1.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
3e788ae437abf105b0a6eb79c1869143157b5c9c6a3f83e33c70ecb161212c72 - Sigstore transparency entry: 1854534890
- Sigstore integration time:
-
Permalink:
mimno/RustMallet@bc2c084d1de9763529904a41f340a040ab88f9fd -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/mimno
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@bc2c084d1de9763529904a41f340a040ab88f9fd -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyrmallet-0.1.1-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: pyrmallet-0.1.1-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 937.3 kB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42e3cd9ff8dc25e4863f4d699ccb3862eeca6c140281c58030da5df018eb9131
|
|
| MD5 |
19e6e0bef4d52c918916783b922e6b49
|
|
| BLAKE2b-256 |
d9c191f2a4ca959877cf595713527be5e6f321b8f5399e756e7a36f2b3053dac
|
Provenance
The following attestation bundles were made for pyrmallet-0.1.1-cp39-abi3-macosx_11_0_arm64.whl:
Publisher:
pypi.yml on mimno/RustMallet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyrmallet-0.1.1-cp39-abi3-macosx_11_0_arm64.whl -
Subject digest:
42e3cd9ff8dc25e4863f4d699ccb3862eeca6c140281c58030da5df018eb9131 - Sigstore transparency entry: 1854534869
- Sigstore integration time:
-
Permalink:
mimno/RustMallet@bc2c084d1de9763529904a41f340a040ab88f9fd -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/mimno
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@bc2c084d1de9763529904a41f340a040ab88f9fd -
Trigger Event:
push
-
Statement type:
File details
Details for the file pyrmallet-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: pyrmallet-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 973.8 kB
- Tags: CPython 3.9+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25e870977c1c15098e43585ae9e167931ec68dce696791d67fc0606ad2a36324
|
|
| MD5 |
decb7d9a1b86a1d32d9779eea125afa7
|
|
| BLAKE2b-256 |
5fa537db80aa821d421d96c62cc8850f669c0ccfadbf58c64da3a15d4b9b5c41
|
Provenance
The following attestation bundles were made for pyrmallet-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl:
Publisher:
pypi.yml on mimno/RustMallet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyrmallet-0.1.1-cp39-abi3-macosx_10_12_x86_64.whl -
Subject digest:
25e870977c1c15098e43585ae9e167931ec68dce696791d67fc0606ad2a36324 - Sigstore transparency entry: 1854534851
- Sigstore integration time:
-
Permalink:
mimno/RustMallet@bc2c084d1de9763529904a41f340a040ab88f9fd -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/mimno
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@bc2c084d1de9763529904a41f340a040ab88f9fd -
Trigger Event:
push
-
Statement type: