Corpus-level inductive thematic analysis via multi-LLM consensus labelling — a member of the lens analyser family.
Project description
thematic-analyser
Corpus-level inductive thematic analysis via multi-LLM consensus labelling — a member of the lens analyser family.
Most family members read one artefact for fixed signals. This one is the
family's first corpus-level, inductive member: it takes a whole corpus and
discovers a codebook. Like cite-sight it is auto_routable=False (a corpus
isn't implied by a file extension).
The method
Harvested from a parked research project (Unveiling Risks in AI Systems, Borck
& Thompson 2024 — see docs/method/). The novelty is not the topic model; it's
what happens to its output:
- Topics — a pluggable, optional topic model proposes candidate themes
(BERTopic via the
[topics]extra, or bring your own precomputed topics). Mirrors BERTopic's clustering/representation split. - Independent — two or more coders (different LLMs) label each topic blind, no peeking.
- Critique — coders see each other's labels and argue over N rounds, revising toward the most defensible shared label.
- Resolve — converged label if they agree; otherwise the majority of the
final round, flagged
agreed=Falsefor a human to settle. - Reliability — Krippendorff's α (the
[irr]extra) over the blind labels — the defensibility number. Percent-agreement fallback otherwise. - Codebook — a flat set of themes the human groups into a hierarchy
(
apply_hierarchy), exportable to REFI-QDA for QualCoder/NVivo/ATLAS.ti.
The human sets the hierarchy; the machine does the labelling and the bookkeeping.
Install
uv venv && uv pip install -e '../lens-contract' -e '.[dev]'
uv run pytest # offline smoke (stub coders, no API key)
uv pip install -e '.[topics]' # + fit topics from raw text (BERTopic)
uv pip install -e '.[llm]' # + real LLM coders (anthropic)
uv pip install -e '.[irr]' # + Krippendorff's alpha
uv pip install -e '.[documents]' # + .pdf/.docx ingestion via document-analyser
CLI
thematic-analyser corpus.txt # fit topics, stub coders, human summary
thematic-analyser corpus.txt --topics topics.json # skip fitting; use precomputed topics
thematic-analyser corpus/ --rounds 3 --json # directory of docs; JSON to stdout
thematic-analyser serve --port 8017 # HTTP API
thematic-analyser manifest # capability manifest
Bare positional = analyse. --json prints the ThematicAnalysis model and
nothing else; diagnostics go to stderr.
Python
from thematic_analyser import ThematicAnalyser, LLMCoder
# Real two-model consensus (needs the [llm] extra + ANTHROPIC_API_KEY):
coders = [
LLMCoder("claude", "claude-opus-4-8", context="jailbreak prompts"),
LLMCoder("haiku", "claude-haiku-4-5-20251001", context="jailbreak prompts"),
]
result = ThematicAnalyser(coders, rounds=3).analyse("corpus.txt", topics="topics.json")
print(result.reliability) # Krippendorff's alpha on the blind labels
print([(c.label, c.agreed) for c in result.consensus])
Without coders it defaults to two offline stub coders so everything runs with no API key — that's what the test suite uses.
HTTP
thematic-analyser serve --port 8017
curl -F file=@corpus.txt -F rounds=3 'http://127.0.0.1:8017/analyse'
curl http://127.0.0.1:8017/health
GET /health, GET /manifest, POST /analyse (multipart corpus upload). The
HTTP face runs the cheap stub-coder default; the LLM tier and human-in-the-loop
curation live in the desktop app, which calls the Python surface directly.
Status
v0.1 scaffold. Working offline path (corpus → topics → consensus → reliability →
codebook → REFI-QDA export). Seams still to flesh out: BERTopic fitting ([topics]),
real provider wiring beyond Anthropic, a full .qdpx writer, and the local
desktop curation app (forked from the debrief/insight-lens shell).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file thematic_analyser-0.2.0.tar.gz.
File metadata
- Download URL: thematic_analyser-0.2.0.tar.gz
- Upload date:
- Size: 16.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c65c7e8f1740f9cfec6ebe7c9ec94d51274888a1d98fcf1d9572a6c1320efe3f
|
|
| MD5 |
07bebc06306b4c65f6fdb389afaa6e8c
|
|
| BLAKE2b-256 |
077d44992b5ea098a96085d976b1e6fcf7a60f9c13aa1b5d237e6fda35058fa4
|
File details
Details for the file thematic_analyser-0.2.0-py3-none-any.whl.
File metadata
- Download URL: thematic_analyser-0.2.0-py3-none-any.whl
- Upload date:
- Size: 22.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc7b03e9e29c8f2efda26dc3fa3c0b3795ce1cd57b0a35c3ed17f3d2e625c65a
|
|
| MD5 |
931ebbb723af9f84c1ed08d475c394ff
|
|
| BLAKE2b-256 |
39d9c2756c8741ed83c1eb44f372b2081db9d2d13f128be29ced244d22a3f58e
|