Skip to main content

Detect semantic shifts in word embeddings over time

Project description

chronowords

Detect semantic shifts over time in word embeddings. Train small PPMI-based language models, create topic models using NMF, and analyze semantic changes using Procrustes alignment.

Features

  • Memory-efficient word embedding training using Count-Min Sketch
  • Topic modeling with Non-negative Matrix Factorization
  • Temporal alignment of word embeddings using Procrustes analysis
  • Cython-optimized PPMI matrix computation

Installation

pip install chronowords

Quick Start

from chronowords.algebra import SVDAlgebra
from chronowords.topics import TopicModel

# Train word embeddings
model = SVDAlgebra(n_components=300)
model.train(your_corpus_iterator)

# Find similar words
similar = model.most_similar('computer')
for word in similar:
    print(f"{word.word}: {word.similarity:.3f}")

# Create topic model
topic_model = TopicModel(n_topics=10)
topic_model.fit(ppmi_matrix, vocabulary)

Documentation

Full documentation available at ReadTheDocs.

Requirements

Python ≥ 3.10 NumPy SciPy scikit-learn Cython

Contributing

Pull requests welcome. For major changes, open an issue first.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chronowords-0.2.0.tar.gz (138.1 kB view details)

Uploaded Source

File details

Details for the file chronowords-0.2.0.tar.gz.

File metadata

  • Download URL: chronowords-0.2.0.tar.gz
  • Upload date:
  • Size: 138.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for chronowords-0.2.0.tar.gz
Algorithm Hash digest
SHA256 91c82c771b34c2c21b8ba81aeb0adec563a454aeee546a0ee27ba4bc36e0fc68
MD5 f20bb29974d20cd2d1c1a143eadda5c4
BLAKE2b-256 2d396c7ca99e85c0fcd971e493138035543b1c1608e20b43dc510f4538432f49

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page