A modular topic modeling toolkit with classical and neural models.

These details have not been verified by PyPI

Project links

Project description

topicnova (import: `topicnova`)

topicnova is the PyPI distribution name. Python imports remain topicnova.

topicnova provides:

Classical models: LDA, BERTopic
Neural models: configurable VAE and SCHOLAR variants
Data utilities: dataset loading, tokenization, dataloaders
Evaluation utilities: coherence and topic diversity

Install

From PyPI (after publish)

pip install topicnova

Local development with uv

uv sync --group dev
uv run python -m spacy download en_core_web_sm

Local-only mode (no wandb)

Set wandb=False (default). In this mode:

no wandb session is initialized
no network logging is attempted
all artifacts stay under your local exp_path

Quickstart

from topicnova.run import run

model, summary = run(
    wandb=False,
    project_name="topic-models",
    wandb_path="./runs",
    dataset_name="20ng",
    data_path="./datasets",
    remove_labels=False,
    tpl=1,
    min_df=30,
    max_df=0.85,
    exp_path="./runs/exp1",
    device="cpu",
    model_name="lda",
    num_topics=20,
    sentence_transformer_name="all-MiniLM-L6-v2",
    doc_emb_dim=384,
    eps=1e-8,
    beta=2.0,
    alpha=0.01,
    batch_size=64,
    lr=1e-3,
    epochs=10,
    random_state=0,
    saved_data=None,
)
print(summary)

Auto mode (minimal input)

You can now call run() with only dataset information. The library will:

infer preprocessing thresholds (min_df, max_df) from dataset stats
default to VAE-ECRTM (vae-...-ecrtm-lin-dir_rsvi-etm) and include labels/authors conditioning when metadata is available
choose defaults for num_topics, epochs, batch_size, lr
select best device automatically (cuda:0 > mps > cpu) when device is omitted or set to "auto"

from topicnova.run import run

model, summary = run(
    dataset_name="20ng",
    data_path="./datasets",
    exp_path="./runs/auto_20ng",
)

Config-based runs

uv run python experiments/run_from_template.py config/template.yaml

Important config keys:

dataset_name: e.g. 20ng, ag_news, dbpedia, self, arxiv_cs
data_path: directory where cached/loaded datasets are stored
model_name: model selector (examples below)
num_topics: set to null to infer from labels where applicable
Performance:
- amp, compile_model, compile_mode
- num_workers, pin_memory, persistent_workers, prefetch_factor
- matmul_precision, cudnn_benchmark
- early_stopping, early_stopping_patience, early_stopping_min_delta

Hyperparameter tuning (random search)

uv run python experiments/tune_random_search.py config/template.yaml --trials 12 --metric auto

Outputs:

per-trial logs in <exp_path>/trial_*
consolidated results in <exp_path>/tuning_results.json

Interactive topic visualization

Option A: generate during training

Set visualize: true in config (or pass visualize=True to run). This writes:

<exp_path>/topics_interactive.html

If model outputs include authors/labels, the graph links:

topic -> words
topic -> authors (author-aware models)
topic -> labels (SCHOLAR)
label -> authors (when both are available)

Clicking a topic node opens a details panel with:

top words
linked label/authors
per-topic quantitative metrics (c_v, c_npmi) when available

Option B: load an existing experiment

from topicnova import visualize_experiment

fig = visualize_experiment("./runs/exp1", notebook=True)

Supported model names

LDA: lda
BERTopic: bertopic
VAE family: vae-<flags>-<encoder>-<sampler>-<decoder>
SCHOLAR family: scholar-<flags>-lin-<sampler>-<decoder>

Common tokens:

Flags: labels, authors, ecrtm (optional)
Encoder: lin, context, llm (VAE); lin (SCHOLAR)
Sampler: dir_pathwise, dir_rsvi
Decoder: lin, etm

Examples:

vae-lin-dir_pathwise-etm
vae-context-dir_rsvi-lin
scholar-labels-lin-dir_rsvi-lin

Custom datasets (`dataset_name: self`)

Place files in data_path:

train.csv
val.csv
test.csv

Required columns:

text
label (optional, list-like string)

Optional:

author

Development

uv sync --group dev
uv run pytest -q
uv run ruff check .

Build and publish

uv sync --group dev
uv run python -m build
uv run twine check dist/*
uv run twine upload dist/*

Recommended first release command sequence:

uv sync --group dev
uv run pytest -q
rm -rf dist
uv run python -m build
uv run twine check dist/*
uv run twine upload dist/topicnova-*.whl dist/topicnova-*.tar.gz

Note: as of February 9, 2026, tomo on PyPI exists, which is why this project publishes as topicnova.

See RELEASING.md for a release checklist.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.0

Feb 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topicnova-0.3.0.tar.gz (320.2 kB view details)

Uploaded Feb 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

topicnova-0.3.0-py3-none-any.whl (43.9 kB view details)

Uploaded Feb 10, 2026 Python 3

File details

Details for the file topicnova-0.3.0.tar.gz.

File metadata

Download URL: topicnova-0.3.0.tar.gz
Upload date: Feb 10, 2026
Size: 320.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for topicnova-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`ab5c7d8c8caa0f69921725b53cd39d0ac60a6da6fdef97f7116407966c3bca0e`
MD5	`a561cb0c77a58c0e5e367947c0832259`
BLAKE2b-256	`68208fbff8500918b2fb812139e170c6ec5df7309ff16ca6333598d0488a6fc4`

See more details on using hashes here.

File details

Details for the file topicnova-0.3.0-py3-none-any.whl.

File metadata

Download URL: topicnova-0.3.0-py3-none-any.whl
Upload date: Feb 10, 2026
Size: 43.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for topicnova-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`83e01fa3ef7256c5bc9e43fa9d0c9bdf9ed55a813c31788244f650e77ccfa227`
MD5	`7ec5c64dc2c7d7eb2495abd7ccf01d03`
BLAKE2b-256	`147022750b94f9a8855c79cc97e390a26e8cb4fea1c93fe1c4739c5d73197bdb`

See more details on using hashes here.

topicnova 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

topicnova (import: topicnova)

Install

From PyPI (after publish)

Local development with uv

Local-only mode (no wandb)

Quickstart

Auto mode (minimal input)

Config-based runs

Hyperparameter tuning (random search)

Interactive topic visualization

Option A: generate during training

Option B: load an existing experiment

Supported model names

Custom datasets (dataset_name: self)

Development

Build and publish

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

topicnova (import: `topicnova`)

Custom datasets (`dataset_name: self`)