Skip to main content

Python Library for Language Change

Project description

LanguageChange

LanguageChange is a Python toolkit for exploring lexical semantic change across corpora and time. It bundles data loaders, embedding pipelines, alignment strategies, and evaluation utilities so you can go from raw corpora to change scores and visual analyses in a single workflow.

Key Features

  • Ready-to-use benchmarks (SemEval 2020 Task 1, DWUG) plus helpers for your own corpora.
  • Static and contextualised representation pipelines (count, PPMI, SVD, transformer-based) with caching.
  • Alignment and comparison utilities (e.g. Orthogonal Procrustes) and standard change metrics such as PRT and APD.
  • Plotting helpers for DWUG graphs and embeddings to inspect model behaviour visually.

Installation

pip install languagechange

LanguageChange targets Python 3.8+ and depends on PyTorch, transformers, and several NLP/visualisation libraries. Installing inside a virtual environment is recommended.

Quickstart

from pathlib import Path
from languagechange.benchmark import SemEval2020Task1
from languagechange.models.representation.static import CountModel, PPMI, SVD

# Download the English SemEval 2020 Task 1 benchmark to the local cache
benchmark = SemEval2020Task1("EN")
corpus = benchmark.corpus1_lemma

artifacts = Path("artifacts")
artifacts.mkdir(exist_ok=True)

# Build a count-based space, transform it with PPMI, then reduce with SVD
count = CountModel(corpus, window_size=10, savepath=artifacts / "corpus1_count")
count.encode()

ppmi = PPMI(count, shifting_parameter=5, smoothing_parameter=0.75, savepath=artifacts / "corpus1_ppmi")
ppmi.encode()

svd = SVD(ppmi, dimensionality=100, gamma=1.0, savepath=artifacts / "corpus1_svd")
svd.encode()
svd.load()

print(svd["plane_nn"])  # vector for a target lemma

More end-to-end walkthroughs live in the examples:

Development Setup

Clone the repository and install an editable build with the project extras you need:

git clone https://github.com/ChangeIsKey/languagechange.git
cd languagechange
python -m venv .venv && source .venv/bin/activate
pip install -e .

Running the examples may require additional packages listed under each example directory.

Documentation

For more detailed information, read the API reference guide.

Citation

The library is under active development. If it supports your research, please cite it as:

@misc{languagechange,
  title = {LanguageChange: A Python library for studying semantic change},
  author = {{Change is Key!}},
  year = {n.d.}
}

Credits

LanguageChange is developed by the Change is Key! team with support from Riksbankens Jubileumsfond (grant M21-0021). Contributions and feedback are very welcome—feel free to open issues or pull requests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchange-0.1.0.tar.gz (48.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchange-0.1.0-py3-none-any.whl (53.2 kB view details)

Uploaded Python 3

File details

Details for the file langchange-0.1.0.tar.gz.

File metadata

  • Download URL: langchange-0.1.0.tar.gz
  • Upload date:
  • Size: 48.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for langchange-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c4e01005196b77245af29f3d28b752483f71c27adddfdf2312a4c183eca73ca6
MD5 ab9140e547a4ba4af07d46f75e97176f
BLAKE2b-256 c02be4d860e2173c1f8e12655d698cc9f85b54a1c75b4e4a658b80e85c62c30e

See more details on using hashes here.

File details

Details for the file langchange-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: langchange-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 53.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for langchange-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e7266d933611db67d113eb615994a27a5204e37ce70637cf6c0bb85f992240bd
MD5 6a81985fe74d65aaf34271fde2e64d2a
BLAKE2b-256 a09264a2a0e2a83e9cbaa068d0fb18f1f17c5e61d01f4d520b4c58085cdf5503

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page