A modular topic modeling toolkit with classical and neural models.
Project description
topicnova (import: topicnova)
topicnova is the PyPI distribution name. Python imports remain topicnova.
topicnova provides:
- Classical models: LDA, BERTopic
- Neural models: configurable VAE and SCHOLAR variants
- Data utilities: dataset loading, tokenization, dataloaders
- Evaluation utilities: coherence and topic diversity
Install
From PyPI (after publish)
pip install topicnova
Local development with uv
uv sync --group dev
uv run python -m spacy download en_core_web_sm
Local-only mode (no wandb)
Set wandb=False (default). In this mode:
- no wandb session is initialized
- no network logging is attempted
- all artifacts stay under your local
exp_path
Quickstart
from topicnova.run import run
model, summary = run(
wandb=False,
project_name="topic-models",
wandb_path="./runs",
dataset_name="20ng",
data_path="./datasets",
remove_labels=False,
tpl=1,
min_df=30,
max_df=0.85,
exp_path="./runs/exp1",
device="cpu",
model_name="lda",
num_topics=20,
sentence_transformer_name="all-MiniLM-L6-v2",
doc_emb_dim=384,
eps=1e-8,
beta=2.0,
alpha=0.01,
batch_size=64,
lr=1e-3,
epochs=10,
random_state=0,
saved_data=None,
)
print(summary)
Auto mode (minimal input)
You can now call run() with only dataset information. The library will:
- infer preprocessing thresholds (
min_df,max_df) from dataset stats - default to VAE-ECRTM (
vae-...-ecrtm-lin-dir_rsvi-etm) and include labels/authors conditioning when metadata is available - choose defaults for
num_topics,epochs,batch_size,lr - select best device automatically (
cuda:0>mps>cpu) whendeviceis omitted or set to"auto"
from topicnova.run import run
model, summary = run(
dataset_name="20ng",
data_path="./datasets",
exp_path="./runs/auto_20ng",
)
Config-based runs
uv run python experiments/run_from_template.py config/template.yaml
Important config keys:
dataset_name: e.g.20ng,ag_news,dbpedia,self,arxiv_csdata_path: directory where cached/loaded datasets are storedmodel_name: model selector (examples below)num_topics: set tonullto infer from labels where applicable- Performance:
amp,compile_model,compile_modenum_workers,pin_memory,persistent_workers,prefetch_factormatmul_precision,cudnn_benchmarkearly_stopping,early_stopping_patience,early_stopping_min_delta
Hyperparameter tuning (random search)
uv run python experiments/tune_random_search.py config/template.yaml --trials 12 --metric auto
Outputs:
- per-trial logs in
<exp_path>/trial_* - consolidated results in
<exp_path>/tuning_results.json
Interactive topic visualization
Option A: generate during training
Set visualize: true in config (or pass visualize=True to run).
This writes:
<exp_path>/topics_interactive.html
If model outputs include authors/labels, the graph links:
- topic -> words
- topic -> authors (author-aware models)
- topic -> labels (SCHOLAR)
- label -> authors (when both are available)
Clicking a topic node opens a details panel with:
- top words
- linked label/authors
- per-topic quantitative metrics (
c_v,c_npmi) when available
Option B: load an existing experiment
from topicnova import visualize_experiment
fig = visualize_experiment("./runs/exp1", notebook=True)
Supported model names
- LDA:
lda - BERTopic:
bertopic - VAE family:
vae-<flags>-<encoder>-<sampler>-<decoder> - SCHOLAR family:
scholar-<flags>-lin-<sampler>-<decoder>
Common tokens:
- Flags:
labels,authors,ecrtm(optional) - Encoder:
lin,context,llm(VAE);lin(SCHOLAR) - Sampler:
dir_pathwise,dir_rsvi - Decoder:
lin,etm
Examples:
vae-lin-dir_pathwise-etmvae-context-dir_rsvi-linscholar-labels-lin-dir_rsvi-lin
Custom datasets (dataset_name: self)
Place files in data_path:
train.csvval.csvtest.csv
Required columns:
textlabel(optional, list-like string)
Optional:
author
Development
uv sync --group dev
uv run pytest -q
uv run ruff check .
Build and publish
uv sync --group dev
uv run python -m build
uv run twine check dist/*
uv run twine upload dist/*
Recommended first release command sequence:
uv sync --group dev
uv run pytest -q
rm -rf dist
uv run python -m build
uv run twine check dist/*
uv run twine upload dist/topicnova-*.whl dist/topicnova-*.tar.gz
Note: as of February 9, 2026, tomo on PyPI exists, which is why this project publishes as topicnova.
See RELEASING.md for a release checklist.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file topicnova-0.3.0.tar.gz.
File metadata
- Download URL: topicnova-0.3.0.tar.gz
- Upload date:
- Size: 320.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab5c7d8c8caa0f69921725b53cd39d0ac60a6da6fdef97f7116407966c3bca0e
|
|
| MD5 |
a561cb0c77a58c0e5e367947c0832259
|
|
| BLAKE2b-256 |
68208fbff8500918b2fb812139e170c6ec5df7309ff16ca6333598d0488a6fc4
|
File details
Details for the file topicnova-0.3.0-py3-none-any.whl.
File metadata
- Download URL: topicnova-0.3.0-py3-none-any.whl
- Upload date:
- Size: 43.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83e01fa3ef7256c5bc9e43fa9d0c9bdf9ed55a813c31788244f650e77ccfa227
|
|
| MD5 |
7ec5c64dc2c7d7eb2495abd7ccf01d03
|
|
| BLAKE2b-256 |
147022750b94f9a8855c79cc97e390a26e8cb4fea1c93fe1c4739c5d73197bdb
|