Skip to main content

Clustering russian words by multiple criterias

Project description

Russian Words Clusters

Russian Words Clusters offers a way to cluster russian words by criterias (by a common stem, by the closeness of vowels or consonants).

For now it supports verbs but was not built for clusterings words that may have different suffixes, as would be a noun and an adjective of a same stem.

It offers options:

  • to merge clusters built on different criterias
  • to input words pairs, useful for clustering verbs and their aspects

The clustering algorithm can be used either with the CLI or with the class's methods.

Simple example: clustering verbs by stem and vowel transformation

Content of file1:

выстрелить
отличать
застрелить
отличить

python3.8 clustering.py --input file1 --criterias STEM TRANS --merge

выстрелить
застрелить
отличать
отличить

Usage

CLI

The CLI clustering.py offers the possibility to cluster words and words pairs.

Classes

Classes in clustering.py can be called to cluster words. As for usage examples, you can refer to the code in the Main part of clustering.py, or to the code contained in the tests folder.

Project has a pip package: https://pypi.org/project/russian-words-clusters/
Once the package installed you can import classes into your Python code using from russianwords.clustering import *.

A more complex example: clustering words pairs

Content of file2:

посещать/посетить
разделять
разбираться/разобраться
выделиться/выделить
изменять/изменить
выделяться/выделять

python3.8 clustering.py --input file2 --criterias STEM TRANS --merge --are-pairs

посещать/посетить
разделять
выделяться/выделять
выделиться/выделить
разбираться/разобраться
изменять/изменить

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

RussianWordsClusters-0.0.10.tar.gz (7.2 kB view hashes)

Uploaded Source

Built Distribution

RussianWordsClusters-0.0.10-py3-none-any.whl (22.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page