Corpus keyness, rank-turbulence divergence, and allotaxonographs

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

zoltanvarju

These details have not been verified by PyPI

Project links

Project description

keyflux logo

keyflux

Corpus keyness, rank-turbulence divergence, and allotaxonographs — in pure Python.

keyflux owns the whole comparison arc that diachronic and comparative discourse analysis usually splits across tools and languages. It derives keywords and lockwords from a focus-versus-reference comparison using proper corpus-linguistic measures (log-likelihood for significance, log ratio for effect size — not just chi-square), compares the resulting ranked lists with rank-turbulence divergence (RTD), and renders the allotaxonograph: the rank-rank map plus the ranked list of which exact words drove the shift. No JavaScript runtime — figures are matplotlib.

It replaces the usual "Jaccard overlap on the top-N keywords" summary — one opaque number that throws away rank, everything below the cutoff, and any account of which words moved — with a transparent, pip-installable pipeline.

Installation

uv add keyflux

Quickstart

from collections import Counter

from keyflux import Keyness, RankedList, rtd, allotaxonograph

# 1. Keyness: focus vs reference
focus = Counter({"climate": 30, "carbon": 12, "the": 80, "policy": 9})
reference = Counter({"climate": 3, "carbon": 1, "the": 78, "market": 15})
k = Keyness(focus, reference, measure="log_likelihood")
keywords = k.keywords(top=20)
lockwords = k.lockwords()

# 2. Rank-turbulence divergence between two ranked lists
r1 = RankedList.from_counts(focus, label="2019")
r2 = RankedList.from_counts(reference, label="2024")
result = rtd(r1, r2, alpha=1 / 3)
print(result.divergence)

# 3. Allotaxonograph (returns a matplotlib Figure)
fig = allotaxonograph(r1, r2, alpha=1 / 3, labels=("2019", "2024"))
fig.savefig("allotaxonograph.png")

Features

Keyness measures: log-likelihood (Dunning), log ratio, Simple Maths, %DIFF, and chi-square (for contrast) — significance flagged against the chi-square thresholds
Keywords and lockwords: positive / negative keywords plus the stable lockword zone
Rank-turbulence divergence: tunable, rank-sensitive comparison of any two rankings (frequency, keyness score, …) with per-type contributions and an explicit alpha-to-zero log limit
Allotaxonographs: a two-panel view (allotaxonograph) and the full Dodds (2020) diamond (allotaxonometer) — rank-rank histogram, iso-divergence contours, wordshift — publication-quality matplotlib, no JS runtime
Reproducibility records: every keyness result emits its reference, cutoffs, and measure

Documentation

Full documentation — quickstart, the keyness and allotaxonograph tutorials, troubleshooting, and the complete API reference — is at keyflux.readthedocs.io. The sources live in docs/.

Research direction: comparing many rankings

Rank-turbulence divergence and the allotaxonograph are pairwise — they compare two rankings at a time. This is true of the whole allotaxonometry line, including the 2025 tooling suite (arXiv:2506.21808). But the questions we care about are often many-way: how does presidential vocabulary drift across all eleven eras at once? Which of a dozen speaker groups is the outlier? Comparing many rankings simultaneously is an open problem we intend to research and, eventually, support.

The nearest existing framework is rank aggregation — finding a consensus ranking that best agrees with a set of input rankings. The classic formulation is the Kemeny median (minimise total pairwise disagreement), which is NP-hard, with squared-distance and set-wise / k-wise generalisations (Kemeny aggregation; squared Kemeny; set-wise Kemeny). Candidate directions for keyflux: a pairwise RTD matrix (all-pairs divergence

clustering/MDS of systems), consensus-vs-each allotaxonographs (compare every ranking against an aggregate), and time-series flipbooks of successive allotaxonographs. If you work on this, we'd love to hear from you.

Roadmap

Planned for the next iteration. The robustness items are analysed in detail in PRE-MORTEM.md, and the open modelling choices are listed in CHANGES_SUMMARY.md.

Robustness / API decisions

Revisit the zero-cell floor default (0.5): it sets the effect size of every exclusive keyword and reorders the top of the list.
Decide whether min_focus_freq / min_reference_freq should default asymmetrically (keep focus-exclusive keywords while demanding more reference evidence).
Add Cohen's d (dispersion-aware effect size) once the corpus input can carry sub-corpus structure.

Proposed features

Rank by any score, not just frequency (RankedList.from_scores) — compare keyword rankings, keyness scores, or any metric.
Comparing many rankings at once — see Research direction above.
Optional self-contained interactive HTML+JS allotaxonograph export (an alpha slider), gated behind an extra so the core stays pure Python.

Maintenance

Publish to PyPI and wire up ReadTheDocs.

Made by

keyflux is made by Crow Intelligence.

License

MIT

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

zoltanvarju

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jul 1, 2026

0.1.2

Jul 1, 2026

0.1.1

Jul 1, 2026

0.1.0

Jul 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keyflux-0.2.0.tar.gz (24.9 kB view details)

Uploaded Jul 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

keyflux-0.2.0-py3-none-any.whl (32.5 kB view details)

Uploaded Jul 1, 2026 Python 3

File details

Details for the file keyflux-0.2.0.tar.gz.

File metadata

Download URL: keyflux-0.2.0.tar.gz
Upload date: Jul 1, 2026
Size: 24.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for keyflux-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`589e3a5e8c3cdd6843ac36836a16e12b05f2b1755225db15ae24f6ce1c6a9d0e`
MD5	`7f9a133fa1d79d70c41e8357cbb0a55e`
BLAKE2b-256	`8d332574458ba96b7b2d397a5d296f2c6f2b530d9d08e09eb07908b05ec4bdde`

See more details on using hashes here.

Provenance

The following attestation bundles were made for keyflux-0.2.0.tar.gz:

Publisher: publish.yml on crow-intelligence/keyflux

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: keyflux-0.2.0.tar.gz
- Subject digest: 589e3a5e8c3cdd6843ac36836a16e12b05f2b1755225db15ae24f6ce1c6a9d0e
- Sigstore transparency entry: 2036240206
- Sigstore integration time: Jul 1, 2026
Source repository:
- Permalink: crow-intelligence/keyflux@20b2d316da42072ef874687f6886a6649e11722e
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/crow-intelligence
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@20b2d316da42072ef874687f6886a6649e11722e
- Trigger Event: release

File details

Details for the file keyflux-0.2.0-py3-none-any.whl.

File metadata

Download URL: keyflux-0.2.0-py3-none-any.whl
Upload date: Jul 1, 2026
Size: 32.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for keyflux-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e82317f2449570b5a7033fc738d6c46a2034107edd4df97bab99a0c1c18125fe`
MD5	`4acf839de5b62c783fec9b27a037ecff`
BLAKE2b-256	`c8f447f531c2bfaef85aaadee54e6f0b3bfa6898aa24f40de6d566c15044eeb3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for keyflux-0.2.0-py3-none-any.whl:

Publisher: publish.yml on crow-intelligence/keyflux

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: keyflux-0.2.0-py3-none-any.whl
- Subject digest: e82317f2449570b5a7033fc738d6c46a2034107edd4df97bab99a0c1c18125fe
- Sigstore transparency entry: 2036240439
- Sigstore integration time: Jul 1, 2026
Source repository:
- Permalink: crow-intelligence/keyflux@20b2d316da42072ef874687f6886a6649e11722e
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/crow-intelligence
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@20b2d316da42072ef874687f6886a6649e11722e
- Trigger Event: release

keyflux 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

keyflux

Installation

Quickstart

Features

Documentation

Research direction: comparing many rankings

Roadmap

Made by

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance