Skip to main content

Compute key lexical bundles (4-grams) keyness between a target corpus and a reference corpus.

Project description

keylexbundles

keylexbundles provides a Python function compute_keyness to compute 4-gram lexical bundle keyness between a target and a reference corpus.


Install and Usage

First, to install keylexbundles, use pip in terminal:

pip install keylexbundles

Once installed, import the package in your python code by running the following code:

import keylexbundles

Once imported, call the function with two required arguments: target_path and reference_path and an optional arguement output_file:

keylexbundles.compute_keyness(target_path="data/target/", reference_path="data/reference/", output_file="output.csv")
argument Description
target_path Folder/directory with .txt files for a target corpus
reference_path Folder/directory with .txt files for a target corpus
output_file (Optional) Name of the CSV file to save results to. Defaults to "output.csv".

Features

  • Accepts two folders that contain .txt files for a target corpus and a reference corpus
  • Extracts 4-gram bundles (contiguous sequences of 4 tokens)

Computes:

  • Whole-corpus frequency keyness: log-likelihood (G²) using raw whole-corpus token counts
  • Text dispersion keyness: log-likelihood (G²) using text dispersion (i.e., the number of texts in which a bundle occurs)
  • Mean text frequency keyness: Cohen's d (standardized difference of mean normalized per-text frequencies)

Outputs a CSV with metrics below, sorted primary by text dispersion keyness and secondary by raw whole-corpus token counts in a target corpus


Output CSV Columns

Column Description
lexical bundle Lexical bundle
whole-corpus frequency keyness Log-likelihood (G²) based on whole-corpus frequency
text dispersion keyness Log-likelihood (G²) based on text dispersion
mean text frequency keyness Cohen's d
raw frequency (target) token count in target corpus
normed frequency (target) token frequency per 1,000 words in target corpus
text dispersion (target) Number of texts where bundle appears in target corpus
mean of normed frequency (target) Mean per-text normalized frequency in target corpus
sd of normed frequency (target) Standard deviation of per-text normalized frequency in target corpus
raw frequency (reference) raw token count in reference corpus
normed frequency (reference) token frequency per 1,000 words in reference corpus
text dispersion (reference) Number of texts where bundle appears in reference corpus
mean of normed frequency (reference) Mean per-text normalized frequency in reference corpus
sd of normed frequency (reference) Standard deviation of per-text normalized frequency in reference corpus

Citation

If you use keylexbundles in your research, please cite it as:
Larsson, T., Kim, T., & Egbert, J. (2025). Introducing and comparing two techniques for key lexical bundles analysis. Research Methods in Applied Linguistics, 4(3), 100245. https://doi.org/10.1016/j.rmal.2025.100245

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keylexbundles-0.1.3.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keylexbundles-0.1.3-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file keylexbundles-0.1.3.tar.gz.

File metadata

  • Download URL: keylexbundles-0.1.3.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for keylexbundles-0.1.3.tar.gz
Algorithm Hash digest
SHA256 52c6c018d6a4f0aafee36371b14a058a40a9484247d970df745ece759ab62230
MD5 294042d5b5e02ba250ceba73a76f34fd
BLAKE2b-256 b809eb93747ddabe14370e06896f5c5058eb294dd8862da93172dcb621ea7752

See more details on using hashes here.

File details

Details for the file keylexbundles-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: keylexbundles-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for keylexbundles-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 667d54fcba11555b70a3e516de6f83ae8793a8a0d2978223bf804555593f267f
MD5 3d02af233f4410cf80d9fd7080f13db9
BLAKE2b-256 32fd8ca9df382ebbc5158f341ad0a008f9eae283c3eb6bd48e30749d55fefcff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page