Skip to main content

Compute key lexical bundles (4-grams) keyness between a target corpus and a reference corpus.

Project description

keylexbundles

keylexbundles provides a Python function compute_keyness to compute 4-gram lexical bundle keyness between a target and a reference corpus.


Install and Usage

First, to install keylexbundles, use pip in terminal:

pip install keylexbundles

Once installed, import the package in your python code by running the following code:

import keylexbundles

Once imported, call the function with two required arguments: target_path and reference_path and an optional arguement output_file:

keylexbundles.compute_keyness(target_path="data/target/", reference_path="data/reference/", output_file="output.csv")
argument Description
target_path Folder/directory with .txt files for a target corpus
reference_path Folder/directory with .txt files for a target corpus
output_file (Optional) Name of the CSV file to save results to. Defaults to "output.csv".

Features

  • Accepts two folders that contain .txt files for a target corpus and a reference corpus
  • Extracts 4-gram bundles (contiguous sequences of 4 tokens)

Computes:

  • Whole-corpus frequency keyness: log-likelihood (G²) using raw whole-corpus token counts
  • Text dispersion keyness: log-likelihood (G²) using text dispersion (i.e., the number of texts in which a bundle occurs)
  • Mean text frequency keyness: Cohen's d (standardized difference of mean normalized per-text frequencies)

Outputs a CSV with metrics below, sorted primary by text dispersion keyness and secondary by raw whole-corpus token counts in a target corpus


Output CSV Columns

Column Description
lexical bundle Lexical bundle
whole-corpus frequency keyness Log-likelihood (G²) based on whole-corpus frequency
text dispersion keyness Log-likelihood (G²) based on text dispersion
mean text frequency keyness Cohen's d
raw frequency (target) token count in target corpus
normed frequency (target) token frequency per 1,000 words in target corpus
text dispersion (target) Number of texts where bundle appears in target corpus
mean of normed frequency (target) Mean per-text normalized frequency in target corpus
sd of normed frequency (target) Standard deviation of per-text normalized frequency in target corpus
raw frequency (reference) raw token count in reference corpus
normed frequency (reference) token frequency per 1,000 words in reference corpus
text dispersion (reference) Number of texts where bundle appears in reference corpus
mean of normed frequency (reference) Mean per-text normalized frequency in reference corpus
sd of normed frequency (reference) Standard deviation of per-text normalized frequency in reference corpus

Citation

If you use keylexbundles in your research, please cite it as:
Larsson, T., Kim, T., & Egbert, J. (2025). Introducing and comparing two techniques for key lexical bundles analysis. Research Methods in Applied Linguistics, 4(3), 100245. https://doi.org/10.1016/j.rmal.2025.100245

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keylexbundles-0.1.1.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keylexbundles-0.1.1-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file keylexbundles-0.1.1.tar.gz.

File metadata

  • Download URL: keylexbundles-0.1.1.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for keylexbundles-0.1.1.tar.gz
Algorithm Hash digest
SHA256 340bd032bb6195c45911bdaab2ac5ef7e04f058fc44a84c2db9cc3ff0e5f3cc3
MD5 c3ef876aac4c0c2f4d8c84ff0ff404af
BLAKE2b-256 2bd5123e9932c4512de3df014101a0afdd5cc45786ed2ded63a4141df74f24ee

See more details on using hashes here.

File details

Details for the file keylexbundles-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: keylexbundles-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.0

File hashes

Hashes for keylexbundles-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 187db03a4ca89d2cd1d157a5da5b2edf1c163e57242ecc158070a399158f9cdf
MD5 1be50488155df1ca391ff550318761d9
BLAKE2b-256 cc49af45391c18bbcd80001b0b0b90f293c8bd158e8df92b2abc3e85b4cfe9a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page