Python Library for Language Change
Project description
LanguageChange
LanguageChange is a Python toolkit for exploring lexical semantic change across corpora and time. It bundles data loaders, embedding pipelines, alignment strategies, and evaluation utilities so you can go from raw corpora to change scores and visual analyses in a single workflow.
Key Features
- Ready-to-use benchmarks (SemEval 2020 Task 1, DWUG) plus helpers for your own corpora.
- Static and contextualised representation pipelines (count, PPMI, SVD, transformer-based) with caching.
- Alignment and comparison utilities (e.g. Orthogonal Procrustes) and standard change metrics such as PRT and APD.
- Plotting helpers for DWUG graphs and embeddings to inspect model behaviour visually.
Installation
pip install languagechange
LanguageChange targets Python 3.8+ and depends on PyTorch, transformers, and several NLP/visualisation libraries. Installing inside a virtual environment is recommended.
Quickstart
from pathlib import Path
from languagechange.benchmark import SemEval2020Task1
from languagechange.models.representation.static import CountModel, PPMI, SVD
# Download the English SemEval 2020 Task 1 benchmark to the local cache
benchmark = SemEval2020Task1("EN")
corpus = benchmark.corpus1_lemma
artifacts = Path("artifacts")
artifacts.mkdir(exist_ok=True)
# Build a count-based space, transform it with PPMI, then reduce with SVD
count = CountModel(corpus, window_size=10, savepath=artifacts / "corpus1_count")
count.encode()
ppmi = PPMI(count, shifting_parameter=5, smoothing_parameter=0.75, savepath=artifacts / "corpus1_ppmi")
ppmi.encode()
svd = SVD(ppmi, dimensionality=100, gamma=1.0, savepath=artifacts / "corpus1_svd")
svd.encode()
svd.load()
print(svd["plane_nn"]) # vector for a target lemma
More end-to-end walkthroughs live in the examples:
- Compare static representations across time slices.
- Visualise DWUG usage graphs to inspect annotator judgements.
Development Setup
Clone the repository and install an editable build with the project extras you need:
git clone https://github.com/ChangeIsKey/languagechange.git
cd languagechange
python -m venv .venv && source .venv/bin/activate
pip install -e .
Running the examples may require additional packages listed under each example directory.
Documentation
For more detailed information, read the API reference guide.
Citation
The library is under active development. If it supports your research, please cite it as:
@misc{languagechange,
title = {LanguageChange: A Python library for studying semantic change},
author = {{Change is Key!}},
year = {n.d.}
}
Credits
LanguageChange is developed by the Change is Key! team with support from Riksbankens Jubileumsfond (grant M21-0021). Contributions and feedback are very welcome—feel free to open issues or pull requests.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchange-0.1.0.tar.gz.
File metadata
- Download URL: langchange-0.1.0.tar.gz
- Upload date:
- Size: 48.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4e01005196b77245af29f3d28b752483f71c27adddfdf2312a4c183eca73ca6
|
|
| MD5 |
ab9140e547a4ba4af07d46f75e97176f
|
|
| BLAKE2b-256 |
c02be4d860e2173c1f8e12655d698cc9f85b54a1c75b4e4a658b80e85c62c30e
|
File details
Details for the file langchange-0.1.0-py3-none-any.whl.
File metadata
- Download URL: langchange-0.1.0-py3-none-any.whl
- Upload date:
- Size: 53.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7266d933611db67d113eb615994a27a5204e37ce70637cf6c0bb85f992240bd
|
|
| MD5 |
6a81985fe74d65aaf34271fde2e64d2a
|
|
| BLAKE2b-256 |
a09264a2a0e2a83e9cbaa068d0fb18f1f17c5e61d01f4d520b4c58085cdf5503
|