Skip to main content

Corpus keyness, rank-turbulence divergence, and allotaxonographs

Project description

keyflux logo

keyflux

Corpus keyness, rank-turbulence divergence, and allotaxonographs — in pure Python.

keyflux owns the whole comparison arc that diachronic and comparative discourse analysis usually splits across tools and languages. It derives keywords and lockwords from a focus-versus-reference comparison using proper corpus-linguistic measures (log-likelihood for significance, log ratio for effect size — not just chi-square), compares the resulting ranked lists with rank-turbulence divergence (RTD), and renders the allotaxonograph: the rank-rank map plus the ranked list of which exact words drove the shift. No JavaScript runtime — figures are matplotlib.

It replaces the usual "Jaccard overlap on the top-N keywords" summary — one opaque number that throws away rank, everything below the cutoff, and any account of which words moved — with a transparent, pip-installable pipeline.

Installation

uv add keyflux

Quickstart

from collections import Counter

from keyflux import Keyness, RankedList, rtd, allotaxonograph

# 1. Keyness: focus vs reference
focus = Counter({"climate": 30, "carbon": 12, "the": 80, "policy": 9})
reference = Counter({"climate": 3, "carbon": 1, "the": 78, "market": 15})
k = Keyness(focus, reference, measure="log_likelihood")
keywords = k.keywords(top=20)
lockwords = k.lockwords()

# 2. Rank-turbulence divergence between two ranked lists
r1 = RankedList.from_counts(focus, label="2019")
r2 = RankedList.from_counts(reference, label="2024")
result = rtd(r1, r2, alpha=1 / 3)
print(result.divergence)

# 3. Allotaxonograph (returns a matplotlib Figure)
fig = allotaxonograph(r1, r2, alpha=1 / 3, labels=("2019", "2024"))
fig.savefig("allotaxonograph.png")

Features

  • Keyness measures: log-likelihood (Dunning), log ratio, Simple Maths, %DIFF, and chi-square (for contrast) — significance flagged against the chi-square thresholds
  • Keywords and lockwords: positive / negative keywords plus the stable lockword zone
  • Rank-turbulence divergence: tunable, rank-sensitive corpus comparison with per-type contributions and an explicit alpha-to-zero log limit
  • Allotaxonograph: publication-quality two-panel matplotlib figure, no JS runtime
  • Reproducibility records: every keyness result emits its reference, cutoffs, and measure

Documentation

Full documentation — quickstart, the keyness and allotaxonograph tutorials, troubleshooting, and the complete API reference — is at keyflux.readthedocs.io. The sources live in docs/.

Roadmap

Planned for the next iteration. The robustness items are analysed in detail in PRE-MORTEM.md, and the open modelling choices are listed in CHANGES_SUMMARY.md.

Robustness / API decisions

  • Revisit the zero-cell floor default (0.5): it sets the effect size of every exclusive keyword and reorders the top of the list.
  • Decide whether min_focus_freq / min_reference_freq should default asymmetrically (keep focus-exclusive keywords while demanding more reference evidence).
  • Add Cohen's d (dispersion-aware effect size) once the corpus input can carry sub-corpus structure.

Proposed features

  • RankedList.from_keyness(..., by="score") — rank by keyness score, not just frequency, so "compare the distinctive-word lists over time" is a one-liner.
  • Optional self-contained interactive HTML+JS allotaxonograph export (an alpha slider), gated behind an extra so the core stays pure Python.

Maintenance

  • Publish to PyPI and wire up ReadTheDocs.

Made by

keyflux is made by Crow Intelligence.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keyflux-0.1.2.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keyflux-0.1.2-py3-none-any.whl (27.4 kB view details)

Uploaded Python 3

File details

Details for the file keyflux-0.1.2.tar.gz.

File metadata

  • Download URL: keyflux-0.1.2.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for keyflux-0.1.2.tar.gz
Algorithm Hash digest
SHA256 37b54ac7c694f7d93c68ffa55a52f11870ec289d70f16c60e98e61c4c9572436
MD5 c2086979839695a7998bc186d19fa791
BLAKE2b-256 e14c712591995e2c13f71652b7cb54bb612360d66a06044c300ac1f64d14a402

See more details on using hashes here.

Provenance

The following attestation bundles were made for keyflux-0.1.2.tar.gz:

Publisher: publish.yml on crow-intelligence/keyflux

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file keyflux-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: keyflux-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 27.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for keyflux-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 83a304ff675733855b14058d512b1e17d5dc3c435faa70c56ddf7621d7f7188f
MD5 5a55a495db21eba90c3c41e55c5dd7c5
BLAKE2b-256 39840e81ca747d2942bc9b5976f8a8527848621e70fbb30f5de4a2ae1aa61bae

See more details on using hashes here.

Provenance

The following attestation bundles were made for keyflux-0.1.2-py3-none-any.whl:

Publisher: publish.yml on crow-intelligence/keyflux

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page