Corpus keyness, rank-turbulence divergence, and allotaxonographs
Project description
keyflux
Corpus keyness, rank-turbulence divergence, and allotaxonographs — in pure Python.
keyflux owns the whole comparison arc that diachronic and comparative discourse analysis usually splits across tools and languages. It derives keywords and lockwords from a focus-versus-reference comparison using proper corpus-linguistic measures (log-likelihood for significance, log ratio for effect size — not just chi-square), compares the resulting ranked lists with rank-turbulence divergence (RTD), and renders the allotaxonograph: the rank-rank map plus the ranked list of which exact words drove the shift. No JavaScript runtime — figures are matplotlib.
It replaces the usual "Jaccard overlap on the top-N keywords" summary — one opaque number that throws away rank, everything below the cutoff, and any account of which words moved — with a transparent, pip-installable pipeline.
Installation
uv add keyflux
Quickstart
from collections import Counter
from keyflux import Keyness, RankedList, rtd, allotaxonograph
# 1. Keyness: focus vs reference
focus = Counter({"climate": 30, "carbon": 12, "the": 80, "policy": 9})
reference = Counter({"climate": 3, "carbon": 1, "the": 78, "market": 15})
k = Keyness(focus, reference, measure="log_likelihood")
keywords = k.keywords(top=20)
lockwords = k.lockwords()
# 2. Rank-turbulence divergence between two ranked lists
r1 = RankedList.from_counts(focus, label="2019")
r2 = RankedList.from_counts(reference, label="2024")
result = rtd(r1, r2, alpha=1 / 3)
print(result.divergence)
# 3. Allotaxonograph (returns a matplotlib Figure)
fig = allotaxonograph(r1, r2, alpha=1 / 3, labels=("2019", "2024"))
fig.savefig("allotaxonograph.png")
Features
- Keyness measures: log-likelihood (Dunning), log ratio, Simple Maths, %DIFF, and chi-square (for contrast) — significance flagged against the chi-square thresholds
- Keywords and lockwords: positive / negative keywords plus the stable lockword zone
- Rank-turbulence divergence: tunable, rank-sensitive corpus comparison with per-type contributions and an explicit alpha-to-zero log limit
- Allotaxonograph: publication-quality two-panel matplotlib figure, no JS runtime
- Reproducibility records: every keyness result emits its reference, cutoffs, and measure
Documentation
Full documentation — quickstart, the keyness and allotaxonograph tutorials,
troubleshooting, and the complete API reference — is at
keyflux.readthedocs.io. The sources live in docs/.
Roadmap
Planned for the next iteration. The robustness items are analysed in detail in
PRE-MORTEM.md, and the open modelling choices are listed in
CHANGES_SUMMARY.md.
Robustness / API decisions
- Revisit the zero-cell floor default (0.5): it sets the effect size of every exclusive keyword and reorders the top of the list.
- Decide whether
min_focus_freq/min_reference_freqshould default asymmetrically (keep focus-exclusive keywords while demanding more reference evidence). - Add Cohen's d (dispersion-aware effect size) once the corpus input can carry sub-corpus structure.
Proposed features
-
RankedList.from_keyness(..., by="score")— rank by keyness score, not just frequency, so "compare the distinctive-word lists over time" is a one-liner. - Optional self-contained interactive HTML+JS allotaxonograph export (an alpha slider), gated behind an extra so the core stays pure Python.
Maintenance
- Publish to PyPI and wire up ReadTheDocs.
Made by
keyflux is made by Crow Intelligence.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file keyflux-0.1.1.tar.gz.
File metadata
- Download URL: keyflux-0.1.1.tar.gz
- Upload date:
- Size: 21.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06b53dbef4ba3e66df6c6911e7519832e7401ce9f2a71c18562f847be65fe684
|
|
| MD5 |
188f25d65364021d8dc1c8363a0e5eab
|
|
| BLAKE2b-256 |
76c51c257d16f6fba2904a25da24b03bc1693f844a4b870c9601baadab53acb0
|
Provenance
The following attestation bundles were made for keyflux-0.1.1.tar.gz:
Publisher:
publish.yml on crow-intelligence/keyflux
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
keyflux-0.1.1.tar.gz -
Subject digest:
06b53dbef4ba3e66df6c6911e7519832e7401ce9f2a71c18562f847be65fe684 - Sigstore transparency entry: 2034750255
- Sigstore integration time:
-
Permalink:
crow-intelligence/keyflux@9760a9cd8e9c57c802b45524532077cf0da064e8 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/crow-intelligence
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9760a9cd8e9c57c802b45524532077cf0da064e8 -
Trigger Event:
release
-
Statement type:
File details
Details for the file keyflux-0.1.1-py3-none-any.whl.
File metadata
- Download URL: keyflux-0.1.1-py3-none-any.whl
- Upload date:
- Size: 27.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b8b4574a8704afb3c308421cec91d890098a55f3c4d845e2ddbbdab55887ad1
|
|
| MD5 |
62b649730370e008709d4f048663b79e
|
|
| BLAKE2b-256 |
f515cedf4fcc16343cd6b9f9ea5967d01437fb66c7c7e1267f1f286e70177be4
|
Provenance
The following attestation bundles were made for keyflux-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on crow-intelligence/keyflux
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
keyflux-0.1.1-py3-none-any.whl -
Subject digest:
9b8b4574a8704afb3c308421cec91d890098a55f3c4d845e2ddbbdab55887ad1 - Sigstore transparency entry: 2034750813
- Sigstore integration time:
-
Permalink:
crow-intelligence/keyflux@9760a9cd8e9c57c802b45524532077cf0da064e8 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/crow-intelligence
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9760a9cd8e9c57c802b45524532077cf0da064e8 -
Trigger Event:
release
-
Statement type: