Detect semantic shifts in word embeddings over time
Project description
chronowords
Detect semantic shifts over time in word embeddings. Train small PPMI-based language models, create topic models using NMF, and analyze semantic changes using Procrustes alignment.
Features
- Memory-efficient word embedding training using Count-Min Sketch
- Topic modeling with Non-negative Matrix Factorization
- Temporal alignment of word embeddings using Procrustes analysis
- Cython-optimized PPMI matrix computation
Installation
pip install chronowords
Quick Start
from chronowords.algebra import SVDAlgebra
from chronowords.topics import TopicModel
# Train word embeddings
model = SVDAlgebra(n_components=300)
model.train(your_corpus_iterator)
# Find similar words
similar = model.most_similar('computer')
for word in similar:
print(f"{word.word}: {word.similarity:.3f}")
# Create topic model
topic_model = TopicModel(n_topics=10)
topic_model.fit(ppmi_matrix, vocabulary)
Links
- Documentation: https://chronowords.readthedocs.io/en/latest/
- PyPI: https://pypi.org/project/chronowords/
Requirements
Python ≥ 3.10 NumPy SciPy scikit-learn Cython
Contributing
Pull requests welcome. For major changes, open an issue first.
License
MIT
Made by
Built and maintained by Crow Intelligence.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
chronowords-0.2.1.tar.gz
(138.5 kB
view details)
File details
Details for the file chronowords-0.2.1.tar.gz.
File metadata
- Download URL: chronowords-0.2.1.tar.gz
- Upload date:
- Size: 138.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b402a84de88d0829f1067633e1034017a83a897b57653f9142fd3dd8e1d0401f
|
|
| MD5 |
af2def9fc1328d7cf0bb2eafb1dc2fab
|
|
| BLAKE2b-256 |
7e2f37112f6afd2dc00bb933c7836ff8a6123941a26bad518de2004e344bd35b
|