Skip to main content

Detect semantic shifts in word embeddings over time

Project description

chronowords

PyPI Docs

chronowords

Detect semantic shifts over time in word embeddings. Train small PPMI-based language models, create topic models using NMF, and analyze semantic changes using Procrustes alignment.

Features

  • Memory-efficient word embedding training using Count-Min Sketch
  • Topic modeling with Non-negative Matrix Factorization
  • Temporal alignment of word embeddings using Procrustes analysis
  • Cython-optimized PPMI matrix computation

Installation

pip install chronowords

Quick Start

from chronowords.algebra import SVDAlgebra
from chronowords.topics import TopicModel

# Train word embeddings
model = SVDAlgebra(n_components=300)
model.train(your_corpus_iterator)

# Find similar words
similar = model.most_similar('computer')
for word in similar:
    print(f"{word.word}: {word.similarity:.3f}")

# Create topic model
topic_model = TopicModel(n_topics=10)
topic_model.fit(ppmi_matrix, vocabulary)

Links

Requirements

Python ≥ 3.10 NumPy SciPy scikit-learn Cython

Contributing

Pull requests welcome. For major changes, open an issue first.

License

MIT

Made by

Built and maintained by Crow Intelligence.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chronowords-0.2.1.tar.gz (138.5 kB view details)

Uploaded Source

File details

Details for the file chronowords-0.2.1.tar.gz.

File metadata

  • Download URL: chronowords-0.2.1.tar.gz
  • Upload date:
  • Size: 138.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for chronowords-0.2.1.tar.gz
Algorithm Hash digest
SHA256 b402a84de88d0829f1067633e1034017a83a897b57653f9142fd3dd8e1d0401f
MD5 af2def9fc1328d7cf0bb2eafb1dc2fab
BLAKE2b-256 7e2f37112f6afd2dc00bb933c7836ff8a6123941a26bad518de2004e344bd35b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page