Compute key lexical bundles (4-grams) keyness between a target corpus and a reference corpus.
Project description
keylexbundles
keylexbundles provides a Python function compute_keyness to compute 4-gram lexical bundle keyness between a target and a reference corpus.
Install and Usage
First, to install keylexbundles, use pip in terminal:
pip install keylexbundles
Once installed, import the package in your python code by running the following code:
import keylexbundles
Once imported, call the function with two required arguments: target_path and reference_path and an optional arguement output_file:
keylexbundles.compute_keyness(target_path="data/target/", reference_path="data/reference/", output_file="output.csv")
| argument | Description |
|---|---|
target_path |
Folder/directory with .txt files for a target corpus |
reference_path |
Folder/directory with .txt files for a target corpus |
output_file |
(Optional) Name of the CSV file to save results to. Defaults to "output.csv". |
Features
- Accepts two folders that contain .txt files for a target corpus and a reference corpus
- Extracts 4-gram bundles (contiguous sequences of 4 tokens)
Computes:
- Whole-corpus frequency keyness: log-likelihood (G²) using raw whole-corpus token counts
- Text dispersion keyness: log-likelihood (G²) using text dispersion (i.e., the number of texts in which a bundle occurs)
- Mean text frequency keyness: Cohen's d (standardized difference of mean normalized per-text frequencies)
Outputs a CSV with metrics below, sorted primary by text dispersion keyness and secondary by raw whole-corpus token counts in a target corpus
Output CSV Columns
| Column | Description |
|---|---|
lexical bundle |
Lexical bundle |
whole-corpus frequency keyness |
Log-likelihood (G²) based on whole-corpus frequency |
text dispersion keyness |
Log-likelihood (G²) based on text dispersion |
mean text frequency keyness |
Cohen's d |
raw frequency (target) |
token count in target corpus |
normed frequency (target) |
token frequency per 1,000 words in target corpus |
text dispersion (target) |
Number of texts where bundle appears in target corpus |
mean of normed frequency (target) |
Mean per-text normalized frequency in target corpus |
sd of normed frequency (target) |
Standard deviation of per-text normalized frequency in target corpus |
raw frequency (reference) |
raw token count in reference corpus |
normed frequency (reference) |
token frequency per 1,000 words in reference corpus |
text dispersion (reference) |
Number of texts where bundle appears in reference corpus |
mean of normed frequency (reference) |
Mean per-text normalized frequency in reference corpus |
sd of normed frequency (reference) |
Standard deviation of per-text normalized frequency in reference corpus |
Citation
If you use keylexbundles in your research, please cite it as:
Larsson, T., Kim, T., & Egbert, J. (2025). Introducing and comparing two techniques for key lexical bundles analysis. Research Methods in Applied Linguistics, 4(3), 100245. https://doi.org/10.1016/j.rmal.2025.100245
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file keylexbundles-0.1.3.tar.gz.
File metadata
- Download URL: keylexbundles-0.1.3.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52c6c018d6a4f0aafee36371b14a058a40a9484247d970df745ece759ab62230
|
|
| MD5 |
294042d5b5e02ba250ceba73a76f34fd
|
|
| BLAKE2b-256 |
b809eb93747ddabe14370e06896f5c5058eb294dd8862da93172dcb621ea7752
|
File details
Details for the file keylexbundles-0.1.3-py3-none-any.whl.
File metadata
- Download URL: keylexbundles-0.1.3-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
667d54fcba11555b70a3e516de6f83ae8793a8a0d2978223bf804555593f267f
|
|
| MD5 |
3d02af233f4410cf80d9fd7080f13db9
|
|
| BLAKE2b-256 |
32fd8ca9df382ebbc5158f341ad0a008f9eae283c3eb6bd48e30749d55fefcff
|