Skip to main content

Clustering russian words by multiple criterias

Project description

Russian Words Clusters

Russian Words Clusters offers a way to cluster russian words by criterias (by a common stem, by the closeness of vowels or consonants).

For now it supports verbs but was not built for clusterings words that may have different suffixes, as would be a noun and an adjective of a same stem.

It offers options:

  • to merge clusters built on different criterias
  • to input words pairs, useful for clustering verbs and their aspects

The clustering algorithm can be used either with the CLI or with the class's methods.

Simple example: clustering verbs by stem and vowel transformation

Content of file1:

выстрелить
отличать
застрелить
отличить

python3.8 clustering.py --input file1 --criterias STEM TRANS --merge

выстрелить
застрелить
отличать
отличить

Usage

CLI

The CLI clustering.py offers the possibility to cluster words and words pairs.

Classes

Classes in clustering.py can be called to cluster words. As for usage examples, you can refer to the code in the Main part of clustering.py, or to the code contained in the tests folder.

Project has a pip package: https://pypi.org/project/russian-words-clusters/
Once the package installed you can import classes into your Python code using from russianwords.clustering import *.

A more complex example: clustering words pairs

Content of file2:

посещать/посетить
разделять
разбираться/разобраться
выделиться/выделить
изменять/изменить
выделяться/выделять

python3.8 clustering.py --input file2 --criterias STEM TRANS --merge --are-pairs

посещать/посетить
разделять
выделяться/выделять
выделиться/выделить
разбираться/разобраться
изменять/изменить

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

RussianWordsClusters-0.0.10.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

RussianWordsClusters-0.0.10-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file RussianWordsClusters-0.0.10.tar.gz.

File metadata

  • Download URL: RussianWordsClusters-0.0.10.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.2

File hashes

Hashes for RussianWordsClusters-0.0.10.tar.gz
Algorithm Hash digest
SHA256 ae816d73af642c6fd36fd49f5ab6fda47f84d9307832ee2f71bc44233db8f727
MD5 196a73873caa8f5396d4dfbad366c06e
BLAKE2b-256 1c03ca46f51f54485929bfad60ee5d1586f06e0fcb413e6e372a3abaeacead36

See more details on using hashes here.

File details

Details for the file RussianWordsClusters-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: RussianWordsClusters-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.2

File hashes

Hashes for RussianWordsClusters-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 dd7760ccbd4b59a21ea5e2fc7b49006b3be8f7c2f5d2d7071f03b18280d69f16
MD5 0b58085a246aecda6e40dc830645189f
BLAKE2b-256 96b2f2c1ea99ca1f8f60e3eef7669f411416ff39f228b74ff622eada854f18bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page