Skip to main content

Adaptive Topic and Cluster Number Determination via structured search

Project description

ATCND

Adaptive Topic and Cluster Number Determination via Structured Search over Sliding Ranges.

ATCND is a model-agnostic framework that determines the optimal number of topics (for LDA/NMF) or clusters (for K-Means) by treating K selection as a structured search problem over a user-specified integer range. Instead of exhaustive grid search requiring O(K_max - K_min) model evaluations, ATCND applies binary search, golden section search, or ternary search to achieve O(log(K_max - K_min)) evaluations while matching or exceeding the quality of grid search.

Key Features

  • Four search strategies: binary, golden section, ternary, grid
  • Three model families: LDA, NMF, K-Means
  • Five quality metrics: silhouette, coherence, perplexity, reconstruction error, combined
  • Returns a ranked set of top-K candidates (handles plateaus, ties, multiple optima)
  • 60--80% fewer model evaluations than grid search
  • CLI and Python API

Installation

pip install atcnd

From source:

git clone https://github.com/CodeOfMe/ATCND.git
cd ATCND
pip install .

Quick Start

Python API

from atcnd import ATCNDConfig, atcnd_search, print_topics

# K-Means on numeric data
from sklearn.datasets import make_blobs
X, _ = make_blobs(n_samples=1000, n_features=50, centers=8, random_state=42)

config = ATCNDConfig(
    k_min=2, k_max=30,
    model_type="kmeans",
    search_strategy="binary",
    metric="silhouette",
    n_candidates=3,
)
result = atcnd_search(X=X, config=config)

print(f"Optimal K: {result.optimal_k}")
print(f"Best score: {result.optimal_score:.4f}")
print(f"Top candidates: {result.candidate_ks}")
print(f"Evaluations: {len(result.search_history)}")

Text data (LDA/NMF)

from atcnd import ATCNDConfig, atcnd_search

texts = ["your document texts here"] * 100

# NMF with coherence metric
config = ATCNDConfig(
    k_min=2, k_max=20,
    model_type="nmf",
    search_strategy="binary",
    metric="coherence",
)
result = atcnd_search(texts=texts, config=config)

CLI

# Search with K-Means on synthetic data
atcnd search --model kmeans --strategy binary --k-min 2 --k-max 30

# Search with NMF
atcnd search --model nmf --strategy golden_section --metric silhouette

# JSON output
atcnd search --model kmeans --json

# Run benchmarks
atcnd benchmark --dataset blobs --k-min 2 --k-max 30

Search Strategies

Strategy Complexity Best for Evaluations (K in [2,30])
Grid O(N) Baseline comparison 29
Binary O(log N) Unimodal objectives ~6
Golden Section O(log_phi N) General objectives ~7
Ternary O(log_{1.5} N) Multi-step objectives ~8

Quality Metrics

Metric LDA NMF K-Means Description
Silhouette Yes Yes Yes Inter-cluster separation vs intra-cluster cohesion
Coherence (c_v) Yes Yes No Semantic coherence of top topic words
Perplexity Yes No No Negative log-likelihood per word
Reconstruction No Yes Yes Frobenius norm / inertia
Combined Yes Yes No 0.5 * silhouette + 0.5 * coherence

Multiple Optima

K is a discrete integer parameter. Multiple values of K may achieve the same or nearly the same quality score due to:

  • Exact equality in f(K) at different K values
  • Plateaus where nearby K values produce identical scores
  • Multiple local maxima from different data partitioning granularities

ATCND returns candidate_ks (a ranked list) and candidate_scores alongside the single best optimal_k, enabling users to make informed decisions based on domain knowledge.

Development

# Install in development mode
pip install -e ".[dev]"

# Run tests
python -m pytest tests/

# Format code
black src/atcnd/ tests/

# Lint code
ruff check src/atcnd/ tests/

License

GNU General Public License v3.0 (GPLv3)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atcnd-0.3.0.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atcnd-0.3.0-py3-none-any.whl (34.4 kB view details)

Uploaded Python 3

File details

Details for the file atcnd-0.3.0.tar.gz.

File metadata

  • Download URL: atcnd-0.3.0.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for atcnd-0.3.0.tar.gz
Algorithm Hash digest
SHA256 bce3162005b954af4fbe530f69b3811285366e946f919e864946eb048b7a2898
MD5 d2f2ceffb4353e55435936067d611362
BLAKE2b-256 1d1e8716bafd9aa7cd4dee624eb60b7ac0609b99fa394070e9a2f307aaa2d1d9

See more details on using hashes here.

File details

Details for the file atcnd-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: atcnd-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 34.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for atcnd-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91dc48ed1fb10ac35021ff31e566f0546858785acceb8d25a965dd2f78a33b36
MD5 25c04d5dd3cc96c661bbf16e67ff682c
BLAKE2b-256 fe6f75def4f8dc20019e1406cb38d90bac7f931326f1097db4e874f63ff050b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page