Skip to main content

Adaptive Topic and Cluster Number Determination via structured search

Project description

ATCND

Adaptive Topic and Cluster Number Determination via Structured Search over Sliding Ranges.

ATCND is a model-agnostic framework that determines the optimal number of topics (for LDA/NMF) or clusters (for K-Means) by treating K selection as a structured search problem over a user-specified integer range. Instead of exhaustive grid search requiring O(K_max - K_min) model evaluations, ATCND applies binary search, golden section search, or ternary search to achieve O(log(K_max - K_min)) evaluations while matching or exceeding the quality of grid search.

Key Features

  • Four search strategies: binary, golden section, ternary, grid
  • Three model families: LDA, NMF, K-Means
  • Five quality metrics: silhouette, coherence, perplexity, reconstruction error, combined
  • Returns a ranked set of top-K candidates (handles plateaus, ties, multiple optima)
  • 60--80% fewer model evaluations than grid search
  • CLI and Python API

Installation

pip install atcnd

From source:

git clone https://github.com/CodeOfMe/ATCND.git
cd ATCND
pip install .

Quick Start

Python API

from atcnd import ATCNDConfig, atcnd_search, print_topics

# K-Means on numeric data
from sklearn.datasets import make_blobs
X, _ = make_blobs(n_samples=1000, n_features=50, centers=8, random_state=42)

config = ATCNDConfig(
    k_min=2, k_max=30,
    model_type="kmeans",
    search_strategy="binary",
    metric="silhouette",
    n_candidates=3,
)
result = atcnd_search(X=X, config=config)

print(f"Optimal K: {result.optimal_k}")
print(f"Best score: {result.optimal_score:.4f}")
print(f"Top candidates: {result.candidate_ks}")
print(f"Evaluations: {len(result.search_history)}")

Text data (LDA/NMF)

from atcnd import ATCNDConfig, atcnd_search

texts = ["your document texts here"] * 100

# NMF with coherence metric
config = ATCNDConfig(
    k_min=2, k_max=20,
    model_type="nmf",
    search_strategy="binary",
    metric="coherence",
)
result = atcnd_search(texts=texts, config=config)

CLI

# Search with K-Means on synthetic data
atcnd search --model kmeans --strategy binary --k-min 2 --k-max 30

# Search with NMF
atcnd search --model nmf --strategy golden_section --metric silhouette

# JSON output
atcnd search --model kmeans --json

# Run benchmarks
atcnd benchmark --dataset blobs --k-min 2 --k-max 30

Search Strategies

Strategy Complexity Best for Evaluations (K in [2,30])
Grid O(N) Baseline comparison 29
Binary O(log N) Unimodal objectives ~6
Golden Section O(log_phi N) General objectives ~7
Ternary O(log_{1.5} N) Multi-step objectives ~8

Quality Metrics

Metric LDA NMF K-Means Description
Silhouette Yes Yes Yes Inter-cluster separation vs intra-cluster cohesion
Coherence (c_v) Yes Yes No Semantic coherence of top topic words
Perplexity Yes No No Negative log-likelihood per word
Reconstruction No Yes Yes Frobenius norm / inertia
Combined Yes Yes No 0.5 * silhouette + 0.5 * coherence

Multiple Optima

K is a discrete integer parameter. Multiple values of K may achieve the same or nearly the same quality score due to:

  • Exact equality in f(K) at different K values
  • Plateaus where nearby K values produce identical scores
  • Multiple local maxima from different data partitioning granularities

ATCND returns candidate_ks (a ranked list) and candidate_scores alongside the single best optimal_k, enabling users to make informed decisions based on domain knowledge.

Development

# Install in development mode
pip install -e ".[dev]"

# Run tests
python -m pytest tests/

# Format code
black src/atcnd/ tests/

# Lint code
ruff check src/atcnd/ tests/

License

GNU General Public License v3.0 (GPLv3)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atcnd-0.2.0.tar.gz (28.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atcnd-0.2.0-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file atcnd-0.2.0.tar.gz.

File metadata

  • Download URL: atcnd-0.2.0.tar.gz
  • Upload date:
  • Size: 28.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for atcnd-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d0deabf9ad95c6def01320a302579c62633b7abfba7571424f85767473e633b5
MD5 22ff2fca9548695d6334f0fa89a6f1da
BLAKE2b-256 46519f783dd4fa65247f23b9c38aad0369839657769c600fca3d43c882cfbdcb

See more details on using hashes here.

File details

Details for the file atcnd-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: atcnd-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 25.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for atcnd-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e60cd43b25ad38055dee395b87e0784256b27cf15775eb07007605ac61eb8191
MD5 ea36c08566b32ab0abd15ae1a6cdfe74
BLAKE2b-256 bad252284dfffed87fb902702f2cd0fc0f0bfd10fbbdb253c99ac9ca429ad8a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page