Corpus-level inductive thematic analysis via multi-LLM consensus labelling — a member of the lens analyser family.

Project description

thematic-analyser

Corpus-level inductive thematic analysis via multi-LLM consensus labelling — a member of the lens analyser family.

Most family members read one artefact for fixed signals. This one is the family's first corpus-level, inductive member: it takes a whole corpus and discovers a codebook. Like cite-sight it is auto_routable=False (a corpus isn't implied by a file extension).

The method

Harvested from a parked research project (Unveiling Risks in AI Systems, Borck & Thompson 2024 — see docs/method/). The novelty is not the topic model; it's what happens to its output:

Topics — a pluggable, optional topic model proposes candidate themes (BERTopic via the [topics] extra, or bring your own precomputed topics). Mirrors BERTopic's clustering/representation split.
Independent — two or more coders (different LLMs) label each topic blind, no peeking.
Critique — coders see each other's labels and argue over N rounds, revising toward the most defensible shared label.
Resolve — converged label if they agree; otherwise the majority of the final round, flagged agreed=False for a human to settle.
Reliability — Krippendorff's α (the [irr] extra) over the blind labels — the defensibility number. Percent-agreement fallback otherwise.
Codebook — a flat set of themes the human groups into a hierarchy (apply_hierarchy), exportable to REFI-QDA for QualCoder/NVivo/ATLAS.ti.

The human sets the hierarchy; the machine does the labelling and the bookkeeping.

Install

uv venv && uv pip install -e '../lens-contract' -e '.[dev]'
uv run pytest                       # offline smoke (stub coders, no API key)

uv pip install -e '.[topics]'       # + fit topics from raw text (BERTopic)
uv pip install -e '.[llm]'          # + real LLM coders (anthropic)
uv pip install -e '.[irr]'          # + Krippendorff's alpha
uv pip install -e '.[documents]'    # + .pdf/.docx ingestion via document-analyser

CLI

thematic-analyser corpus.txt                      # fit topics, stub coders, human summary
thematic-analyser corpus.txt --topics topics.json # skip fitting; use precomputed topics
thematic-analyser corpus/ --rounds 3 --json       # directory of docs; JSON to stdout
thematic-analyser serve --port 8017               # HTTP API
thematic-analyser manifest                        # capability manifest

Bare positional = analyse. --json prints the ThematicAnalysis model and nothing else; diagnostics go to stderr.

Python

from thematic_analyser import ThematicAnalyser, LLMCoder

# Real two-model consensus (needs the [llm] extra + ANTHROPIC_API_KEY):
coders = [
    LLMCoder("claude", "claude-opus-4-8", context="jailbreak prompts"),
    LLMCoder("haiku",  "claude-haiku-4-5-20251001", context="jailbreak prompts"),
]
result = ThematicAnalyser(coders, rounds=3).analyse("corpus.txt", topics="topics.json")
print(result.reliability)            # Krippendorff's alpha on the blind labels
print([(c.label, c.agreed) for c in result.consensus])

Without coders it defaults to two offline stub coders so everything runs with no API key — that's what the test suite uses.

HTTP

thematic-analyser serve --port 8017
curl -F file=@corpus.txt -F rounds=3 'http://127.0.0.1:8017/analyse'
curl http://127.0.0.1:8017/health

GET /health, GET /manifest, POST /analyse (multipart corpus upload). The HTTP face runs the cheap stub-coder default; the LLM tier and human-in-the-loop curation live in the desktop app, which calls the Python surface directly.

Status

v0.1 scaffold. Working offline path (corpus → topics → consensus → reliability → codebook → REFI-QDA export). Seams still to flesh out: BERTopic fitting ([topics]), real provider wiring beyond Anthropic, a full .qdpx writer, and the local desktop curation app (forked from the debrief/insight-lens shell).

Project details

Release history Release notifications | RSS feed

0.2.0

Jun 22, 2026

This version

0.1.0

Jun 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thematic_analyser-0.1.0.tar.gz (13.9 kB view details)

Uploaded Jun 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

thematic_analyser-0.1.0-py3-none-any.whl (19.2 kB view details)

Uploaded Jun 22, 2026 Python 3

File details

Details for the file thematic_analyser-0.1.0.tar.gz.

File metadata

Download URL: thematic_analyser-0.1.0.tar.gz
Upload date: Jun 22, 2026
Size: 13.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for thematic_analyser-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`7d701d08e3d3512ac06bdc8730b7caf7ffa406bd0cf99c0c8b02361ba2e6b8fb`
MD5	`3235b7365653231f9d2037fd4b792028`
BLAKE2b-256	`5e0201bdb0cca0228a36640774809d23dcacbfa2c5a9d935be2cc4022bd501b9`

See more details on using hashes here.

File details

Details for the file thematic_analyser-0.1.0-py3-none-any.whl.

File metadata

Download URL: thematic_analyser-0.1.0-py3-none-any.whl
Upload date: Jun 22, 2026
Size: 19.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for thematic_analyser-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ba601390e67a4bc937d9aa195803a629fcf27940fb6fb5487a5bc832ef2fd5d5`
MD5	`2f9cebb40a6f207cebd6af4075da79cf`
BLAKE2b-256	`6711d11266a0a2dcd19f84b47679dd42206d17ed52790a72825f919bf41d3c8d`

See more details on using hashes here.

thematic-analyser 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

thematic-analyser

The method

Install

CLI

Python

HTTP

Status

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes