Skip to main content

Corpus keyness, rank-turbulence divergence, and allotaxonographs

Project description

keyflux logo

keyflux

Corpus keyness, rank-turbulence divergence, and allotaxonographs — in pure Python.

keyflux owns the whole comparison arc that diachronic and comparative discourse analysis usually splits across tools and languages. It derives keywords and lockwords from a focus-versus-reference comparison using proper corpus-linguistic measures (log-likelihood for significance, log ratio for effect size — not just chi-square), compares the resulting ranked lists with rank-turbulence divergence (RTD), and renders the allotaxonograph: the rank-rank map plus the ranked list of which exact words drove the shift. No JavaScript runtime — figures are matplotlib.

It replaces the usual "Jaccard overlap on the top-N keywords" summary — one opaque number that throws away rank, everything below the cutoff, and any account of which words moved — with a transparent, pip-installable pipeline.

Installation

uv add keyflux

Quickstart

from collections import Counter

from keyflux import Keyness, RankedList, rtd, allotaxonograph

# 1. Keyness: focus vs reference
focus = Counter({"climate": 30, "carbon": 12, "the": 80, "policy": 9})
reference = Counter({"climate": 3, "carbon": 1, "the": 78, "market": 15})
k = Keyness(focus, reference, measure="log_likelihood")
keywords = k.keywords(top=20)
lockwords = k.lockwords()

# 2. Rank-turbulence divergence between two ranked lists
r1 = RankedList.from_counts(focus, label="2019")
r2 = RankedList.from_counts(reference, label="2024")
result = rtd(r1, r2, alpha=1 / 3)
print(result.divergence)

# 3. Allotaxonograph (returns a matplotlib Figure)
fig = allotaxonograph(r1, r2, alpha=1 / 3, labels=("2019", "2024"))
fig.savefig("allotaxonograph.png")

Features

  • Keyness measures: log-likelihood (Dunning), log ratio, Simple Maths, %DIFF, and chi-square (for contrast) — significance flagged against the chi-square thresholds
  • Keywords and lockwords: positive / negative keywords plus the stable lockword zone
  • Rank-turbulence divergence: tunable, rank-sensitive corpus comparison with per-type contributions and an explicit alpha-to-zero log limit
  • Allotaxonograph: publication-quality two-panel matplotlib figure, no JS runtime
  • Reproducibility records: every keyness result emits its reference, cutoffs, and measure

Documentation

Full documentation — quickstart, the keyness and allotaxonograph tutorials, troubleshooting, and the complete API reference — is at keyflux.readthedocs.io. The sources live in docs/.

Roadmap

Planned for the next iteration. The robustness items are analysed in detail in PRE-MORTEM.md, and the open modelling choices are listed in CHANGES_SUMMARY.md.

Robustness / API decisions

  • Revisit the zero-cell floor default (0.5): it sets the effect size of every exclusive keyword and reorders the top of the list.
  • Decide whether min_focus_freq / min_reference_freq should default asymmetrically (keep focus-exclusive keywords while demanding more reference evidence).
  • Add Cohen's d (dispersion-aware effect size) once the corpus input can carry sub-corpus structure.

Proposed features

  • RankedList.from_keyness(..., by="score") — rank by keyness score, not just frequency, so "compare the distinctive-word lists over time" is a one-liner.
  • Optional self-contained interactive HTML+JS allotaxonograph export (an alpha slider), gated behind an extra so the core stays pure Python.

Maintenance

  • Publish to PyPI and wire up ReadTheDocs.

Made by

keyflux is made by Crow Intelligence.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keyflux-0.1.1.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keyflux-0.1.1-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file keyflux-0.1.1.tar.gz.

File metadata

  • Download URL: keyflux-0.1.1.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for keyflux-0.1.1.tar.gz
Algorithm Hash digest
SHA256 06b53dbef4ba3e66df6c6911e7519832e7401ce9f2a71c18562f847be65fe684
MD5 188f25d65364021d8dc1c8363a0e5eab
BLAKE2b-256 76c51c257d16f6fba2904a25da24b03bc1693f844a4b870c9601baadab53acb0

See more details on using hashes here.

Provenance

The following attestation bundles were made for keyflux-0.1.1.tar.gz:

Publisher: publish.yml on crow-intelligence/keyflux

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file keyflux-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: keyflux-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 27.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for keyflux-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9b8b4574a8704afb3c308421cec91d890098a55f3c4d845e2ddbbdab55887ad1
MD5 62b649730370e008709d4f048663b79e
BLAKE2b-256 f515cedf4fcc16343cd6b9f9ea5967d01437fb66c7c7e1267f1f286e70177be4

See more details on using hashes here.

Provenance

The following attestation bundles were made for keyflux-0.1.1-py3-none-any.whl:

Publisher: publish.yml on crow-intelligence/keyflux

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page