Adaptive Topic and Cluster Number Determination via structured search
Project description
ATCND
Adaptive Topic and Cluster Number Determination via Structured Search over Sliding Ranges.
ATCND is a model-agnostic framework that determines the optimal number of topics (for LDA/NMF) or clusters (for K-Means) by treating K selection as a structured search problem over a user-specified integer range. Instead of exhaustive grid search requiring O(K_max - K_min) model evaluations, ATCND applies binary search, golden section search, or ternary search to achieve O(log(K_max - K_min)) evaluations while matching or exceeding the quality of grid search.
Key Features
- Four search strategies: binary, golden section, ternary, grid
- Three model families: LDA, NMF, K-Means
- Five quality metrics: silhouette, coherence, perplexity, reconstruction error, combined
- Returns a ranked set of top-K candidates (handles plateaus, ties, multiple optima)
- 60--80% fewer model evaluations than grid search
- CLI and Python API
Installation
pip install atcnd
From source:
git clone https://github.com/CodeOfMe/ATCND.git
cd ATCND
pip install .
Quick Start
Python API
from atcnd import ATCNDConfig, atcnd_search, print_topics
# K-Means on numeric data
from sklearn.datasets import make_blobs
X, _ = make_blobs(n_samples=1000, n_features=50, centers=8, random_state=42)
config = ATCNDConfig(
k_min=2, k_max=30,
model_type="kmeans",
search_strategy="binary",
metric="silhouette",
n_candidates=3,
)
result = atcnd_search(X=X, config=config)
print(f"Optimal K: {result.optimal_k}")
print(f"Best score: {result.optimal_score:.4f}")
print(f"Top candidates: {result.candidate_ks}")
print(f"Evaluations: {len(result.search_history)}")
Text data (LDA/NMF)
from atcnd import ATCNDConfig, atcnd_search
texts = ["your document texts here"] * 100
# NMF with coherence metric
config = ATCNDConfig(
k_min=2, k_max=20,
model_type="nmf",
search_strategy="binary",
metric="coherence",
)
result = atcnd_search(texts=texts, config=config)
CLI
# Search with K-Means on synthetic data
atcnd search --model kmeans --strategy binary --k-min 2 --k-max 30
# Search with NMF
atcnd search --model nmf --strategy golden_section --metric silhouette
# JSON output
atcnd search --model kmeans --json
# Run benchmarks
atcnd benchmark --dataset blobs --k-min 2 --k-max 30
Search Strategies
| Strategy | Complexity | Best for | Evaluations (K in [2,30]) |
|---|---|---|---|
| Grid | O(N) | Baseline comparison | 29 |
| Binary | O(log N) | Unimodal objectives | ~6 |
| Golden Section | O(log_phi N) | General objectives | ~7 |
| Ternary | O(log_{1.5} N) | Multi-step objectives | ~8 |
Quality Metrics
| Metric | LDA | NMF | K-Means | Description |
|---|---|---|---|---|
| Silhouette | Yes | Yes | Yes | Inter-cluster separation vs intra-cluster cohesion |
| Coherence (c_v) | Yes | Yes | No | Semantic coherence of top topic words |
| Perplexity | Yes | No | No | Negative log-likelihood per word |
| Reconstruction | No | Yes | Yes | Frobenius norm / inertia |
| Combined | Yes | Yes | No | 0.5 * silhouette + 0.5 * coherence |
Multiple Optima
K is a discrete integer parameter. Multiple values of K may achieve the same or nearly the same quality score due to:
- Exact equality in f(K) at different K values
- Plateaus where nearby K values produce identical scores
- Multiple local maxima from different data partitioning granularities
ATCND returns candidate_ks (a ranked list) and candidate_scores alongside the single best optimal_k, enabling users to make informed decisions based on domain knowledge.
Development
# Install in development mode
pip install -e ".[dev]"
# Run tests
python -m pytest tests/
# Format code
black src/atcnd/ tests/
# Lint code
ruff check src/atcnd/ tests/
License
GNU General Public License v3.0 (GPLv3)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file atcnd-0.3.0.tar.gz.
File metadata
- Download URL: atcnd-0.3.0.tar.gz
- Upload date:
- Size: 35.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bce3162005b954af4fbe530f69b3811285366e946f919e864946eb048b7a2898
|
|
| MD5 |
d2f2ceffb4353e55435936067d611362
|
|
| BLAKE2b-256 |
1d1e8716bafd9aa7cd4dee624eb60b7ac0609b99fa394070e9a2f307aaa2d1d9
|
File details
Details for the file atcnd-0.3.0-py3-none-any.whl.
File metadata
- Download URL: atcnd-0.3.0-py3-none-any.whl
- Upload date:
- Size: 34.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91dc48ed1fb10ac35021ff31e566f0546858785acceb8d25a965dd2f78a33b36
|
|
| MD5 |
25c04d5dd3cc96c661bbf16e67ff682c
|
|
| BLAKE2b-256 |
fe6f75def4f8dc20019e1406cb38d90bac7f931326f1097db4e874f63ff050b8
|