Weighted family-resemblance clustering grounded in Wittgenstein PI §65-67 (prototype-free, sklearn-compatible).
Project description
family-resemblance
Weighted family-resemblance clustering grounded in Wittgenstein's Philosophical Investigations §65–67. Prototype-free. scikit-learn-compatible estimator.
Install
pip install family-resemblance
# optional extras:
pip install "family-resemblance[mcp]" # MCP schema-from-use helpers
pip install "family-resemblance[viz]" # matplotlib / seaborn plotting (roadmap)
pip install "family-resemblance[dev]" # tests, build, twine
pip install family-resemblance is enough to use WFRCluster, the
therapeutic describe() helper, and the fr CLI. The optional extras are
truly optional — the package imports cleanly without them.
Quickstart (Python)
import numpy as np
from family_resemblance import WFRCluster, describe
X = np.array([[0.0, 0.0], [0.1, 0.1], [5.0, 5.0], [5.1, 4.9]])
wfr = WFRCluster(eps=0.4, min_samples=2).fit(X)
print(wfr.labels_) # [0 0 1 1]
print(wfr.family_membership()) # ~[0.99 0.99 0.99 0.99]
for i, lab in enumerate(wfr.labels_):
r = describe(int(lab), float(wfr.family_membership()[i]))
print(r.description)
Quickstart (CLI)
fr version
fr cluster X.csv --eps 0.4 --min-samples 2
fr inspect X.csv --threshold 0.5
fr cluster emits {"labels": [...]}; fr inspect emits a list of
{i, label, confidence, boundary, description} records.
Why "family resemblance"?
In §65 of the Investigations Wittgenstein objects to the philosophical search for the essence shared by every member of a category. In §66 he walks through board games, card games, ball games, video games, and asks what is common to them all. His answer in §67 is that there is no such common feature — only overlapping similarities, like resemblances within a family.
This library takes that picture as its data structure. A cluster has no
centre and no prototype: only a weighted aggregation of pairwise
feature-similarities (see core/wfr.py).
No language model is in the loop. Schema induction, family membership, and the therapeutic boundary check are pure-mechanical (numpy / scikit-learn / genson only). The package does not ship, call, or fine-tune an LLM.
Wittgenstein → API map
| PI § | Concept | API surface |
|---|---|---|
| §43 | meaning = use | UseTrace ([mcp] extra) |
| §65–67 | family resemblance, no shared essence | WFRCluster |
| §133 | honest, therapeutic limit | describe() / TherapeuticResponse |
| §201 | rule-following indeterminacy | DBSCAN noise label -1 |
| §243–315 | private-language argument | induce_with_confidence min_support hide gate |
How is this different from Prototypical Networks?
| Prototypical Networks (Snell+ 2017) | family-resemblance | |
|---|---|---|
| Cluster representation | mean / centroid | none |
| Membership test | nearest centroid | density-connected via pairwise resemblance |
| Boundary / low-confidence | hard nearest assignment | honest TherapeuticResponse (PI §133) |
| Wittgenstein integration | none | §65–67, §133, §201, §243–315 cited in API |
| sklearn-compatible | partial | yes (ClusterMixin + BaseEstimator) |
| Distance must be metric | yes | no — non-transitive resemblance is supported |
| Per-feature weighting | learned | user-supplied (renormalised); v0.2 will learn |
Therapeutic mode (PI §133, §201)
describe() returns a boundary-aware response:
>>> describe(label=0, confidence=0.3, threshold=0.5).description
'Point assigned to family 0 with confidence 0.30 (< threshold 0.50). The
boundary is genuinely fuzzy (PI §65-67) and no centre defines the family.'
>>> describe(label=-1, confidence=0.0).description
'No family found for this point (DBSCAN noise label). Following PI §201,
no single rule decides its membership.'
This is the library's main contribution beyond "clustering with a different distance function". When the model cannot confidently classify, it does not invent a missing rule — it reports the limit honestly.
Optional [mcp] extra
The [mcp] extra exposes a tiny "schema from use" pipeline (PI §43).
Schemas are not emitted until a tool has been seen at least min_support
times — the private-language argument (PI §243–315) refuses to call a
single use a rule:
from family_resemblance._ext.mcp.inducer import induce_with_confidence
schema, conf = induce_with_confidence([{"x": 1}], min_support=3)
# schema is None, conf is 1/3 — refused to emit
schema, conf = induce_with_confidence(
[{"x": 1}, {"x": 2}, {"x": 3}], min_support=3
)
# schema is a real JSON Schema, conf is 1.0
The FastMCP server in _ext/mcp/server.py is currently a skeleton; the
full induce-then-replay loop, SQLite-backed UseTrace persistence, and
SchemaCandidate.contradictions tracking ship in v0.2.
A runnable examples/mcp_translate_demo.py is roadmapped for v0.2
alongside the FastMCP transport wiring; until then the snippet above is
the reference demo for the [mcp] extra.
Related work
- Rosch (1975) — classical prototype theory of categorisation, the position this library deliberately rejects by removing centres from clusters.
- arXiv:2601.01127 — Weighted Family Resemblance Clustering; the
pairwise similarity formulation ported in
core/wfr.py. - LGDL by Marco Graziano —
grammar-driven language-game framework. Complementary: LGDL fixes
grammars up front, whereas
family-resemblancelets families form from observed use.
License and data
The library is MIT-licensed (see LICENSE).
The data/ directory contains the Tractatus Logico-Philosophicus
(Project Gutenberg eBook #5740, US public domain, bilingual: Ogden's 1922
English translation alongside Wittgenstein's German original). See
data/PROVENANCE.md for full attribution and for the
fair-use policy (≤ 50 words per §, ≤ 250 words across the whole repo) that
governs quotations from Philosophical Investigations. The policy is
enforced automatically by
tests/test_provenance_policy.py.
Citation
@software{family_resemblance_0_1,
author = {runza},
title = {family-resemblance: weighted family-resemblance clustering},
year = 2026,
url = {https://github.com/hinanohart/family-resemblance},
}
Algorithmic inspiration: arXiv:2601.01127 (Weighted Family Resemblance Clustering). The §65–67 reading of clustering as overlapping similarities is, of course, Wittgenstein's.
Status
v0.1 alpha. See CHANGELOG.md for the roadmap.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file family_resemblance-0.1.0.tar.gz.
File metadata
- Download URL: family_resemblance-0.1.0.tar.gz
- Upload date:
- Size: 30.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e0267be45fdbab5d5c74377f7c4f364d71293020d4d87dff152dee16dbeb044
|
|
| MD5 |
54530fab3fb6b482ecd196142576ea36
|
|
| BLAKE2b-256 |
5909057db1e38d9f5bfca031d0797b3e49f9cc10746d11f2fdd2d7539d6ad35f
|
File details
Details for the file family_resemblance-0.1.0-py3-none-any.whl.
File metadata
- Download URL: family_resemblance-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a645d8b0b828895c8a01a69759ff45f19da53363fa6b8b2be941de9d350d4e5
|
|
| MD5 |
948e740f57fd6f2bda3140d1e53494e1
|
|
| BLAKE2b-256 |
41d3baf2ddc67d0bf5b5cf566d56e4d97984de1a3de434cc7f9b76f9db41fbfe
|