Skip to main content

Corpus keyness, rank-turbulence divergence, and allotaxonographs

Project description

keyflux logo

keyflux

Corpus keyness, rank-turbulence divergence, and allotaxonographs — in pure Python.

keyflux owns the whole comparison arc that diachronic and comparative discourse analysis usually splits across tools and languages. It derives keywords and lockwords from a focus-versus-reference comparison using proper corpus-linguistic measures (log-likelihood for significance, log ratio for effect size — not just chi-square), compares the resulting ranked lists with rank-turbulence divergence (RTD), and renders the allotaxonograph: the rank-rank map plus the ranked list of which exact words drove the shift. No JavaScript runtime — figures are matplotlib.

It replaces the usual "Jaccard overlap on the top-N keywords" summary — one opaque number that throws away rank, everything below the cutoff, and any account of which words moved — with a transparent, pip-installable pipeline.

Installation

uv add keyflux

Quickstart

from collections import Counter

from keyflux import Keyness, RankedList, rtd, allotaxonograph

# 1. Keyness: focus vs reference
focus = Counter({"climate": 30, "carbon": 12, "the": 80, "policy": 9})
reference = Counter({"climate": 3, "carbon": 1, "the": 78, "market": 15})
k = Keyness(focus, reference, measure="log_likelihood")
keywords = k.keywords(top=20)
lockwords = k.lockwords()

# 2. Rank-turbulence divergence between two ranked lists
r1 = RankedList.from_counts(focus, label="2019")
r2 = RankedList.from_counts(reference, label="2024")
result = rtd(r1, r2, alpha=1 / 3)
print(result.divergence)

# 3. Allotaxonograph (returns a matplotlib Figure)
fig = allotaxonograph(r1, r2, alpha=1 / 3, labels=("2019", "2024"))
fig.savefig("allotaxonograph.png")

Features

  • Keyness measures: log-likelihood (Dunning), log ratio, Simple Maths, %DIFF, and chi-square (for contrast) — significance flagged against the chi-square thresholds
  • Keywords and lockwords: positive / negative keywords plus the stable lockword zone
  • Rank-turbulence divergence: tunable, rank-sensitive comparison of any two rankings (frequency, keyness score, …) with per-type contributions and an explicit alpha-to-zero log limit
  • Allotaxonographs: a two-panel view (allotaxonograph) and the full Dodds (2020) diamond (allotaxonometer) — rank-rank histogram, iso-divergence contours, wordshift — publication-quality matplotlib, no JS runtime
  • Reproducibility records: every keyness result emits its reference, cutoffs, and measure

Documentation

Full documentation — quickstart, the keyness and allotaxonograph tutorials, troubleshooting, and the complete API reference — is at keyflux.readthedocs.io. The sources live in docs/.

Research direction: comparing many rankings

Rank-turbulence divergence and the allotaxonograph are pairwise — they compare two rankings at a time. This is true of the whole allotaxonometry line, including the 2025 tooling suite (arXiv:2506.21808). But the questions we care about are often many-way: how does presidential vocabulary drift across all eleven eras at once? Which of a dozen speaker groups is the outlier? Comparing many rankings simultaneously is an open problem we intend to research and, eventually, support.

The nearest existing framework is rank aggregation — finding a consensus ranking that best agrees with a set of input rankings. The classic formulation is the Kemeny median (minimise total pairwise disagreement), which is NP-hard, with squared-distance and set-wise / k-wise generalisations (Kemeny aggregation; squared Kemeny; set-wise Kemeny). Candidate directions for keyflux: a pairwise RTD matrix (all-pairs divergence

  • clustering/MDS of systems), consensus-vs-each allotaxonographs (compare every ranking against an aggregate), and time-series flipbooks of successive allotaxonographs. If you work on this, we'd love to hear from you.

Roadmap

Planned for the next iteration. The robustness items are analysed in detail in PRE-MORTEM.md, and the open modelling choices are listed in CHANGES_SUMMARY.md.

Robustness / API decisions

  • Revisit the zero-cell floor default (0.5): it sets the effect size of every exclusive keyword and reorders the top of the list.
  • Decide whether min_focus_freq / min_reference_freq should default asymmetrically (keep focus-exclusive keywords while demanding more reference evidence).
  • Add Cohen's d (dispersion-aware effect size) once the corpus input can carry sub-corpus structure.

Proposed features

  • Rank by any score, not just frequency (RankedList.from_scores) — compare keyword rankings, keyness scores, or any metric.
  • Comparing many rankings at once — see Research direction above.
  • Optional self-contained interactive HTML+JS allotaxonograph export (an alpha slider), gated behind an extra so the core stays pure Python.

Maintenance

  • Publish to PyPI and wire up ReadTheDocs.

Made by

keyflux is made by Crow Intelligence.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keyflux-0.2.0.tar.gz (24.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keyflux-0.2.0-py3-none-any.whl (32.5 kB view details)

Uploaded Python 3

File details

Details for the file keyflux-0.2.0.tar.gz.

File metadata

  • Download URL: keyflux-0.2.0.tar.gz
  • Upload date:
  • Size: 24.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for keyflux-0.2.0.tar.gz
Algorithm Hash digest
SHA256 589e3a5e8c3cdd6843ac36836a16e12b05f2b1755225db15ae24f6ce1c6a9d0e
MD5 7f9a133fa1d79d70c41e8357cbb0a55e
BLAKE2b-256 8d332574458ba96b7b2d397a5d296f2c6f2b530d9d08e09eb07908b05ec4bdde

See more details on using hashes here.

Provenance

The following attestation bundles were made for keyflux-0.2.0.tar.gz:

Publisher: publish.yml on crow-intelligence/keyflux

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file keyflux-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: keyflux-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 32.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for keyflux-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e82317f2449570b5a7033fc738d6c46a2034107edd4df97bab99a0c1c18125fe
MD5 4acf839de5b62c783fec9b27a037ecff
BLAKE2b-256 c8f447f531c2bfaef85aaadee54e6f0b3bfa6898aa24f40de6d566c15044eeb3

See more details on using hashes here.

Provenance

The following attestation bundles were made for keyflux-0.2.0-py3-none-any.whl:

Publisher: publish.yml on crow-intelligence/keyflux

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page