Clustering russian words by multiple criterias
Project description
Russian Words Clusters
Russian Words Clusters offers a way to cluster russian words by criterias (by a common stem, by the closeness of vowels or consonants).
For now it supports verbs but was not built for clusterings words that may have different suffixes, as would be a noun and an adjective of a same stem.
It offers options:
- to merge clusters built on different criterias
- to input words pairs, useful for clustering verbs and their aspects
The clustering algorithm can be used either with the CLI or with the class's methods.
Simple example: clustering verbs by stem and vowel transformation
Content of file1
:
выстрелить
отличать
застрелить
отличить
python3.8 clustering.py --input file1 --criterias STEM TRANS --merge
выстрелить
застрелить
отличать
отличить
Usage
CLI
The CLI clustering.py
offers the possibility to cluster words and words pairs.
Classes
Classes in clustering.py
can be called to cluster words. As for usage examples, you can refer to the code in the Main part of clustering.py
, or to the code contained in the tests
folder.
Project has a pip package: https://pypi.org/project/russian-words-clusters/
Once the package installed you can import classes into your Python code using from russianwords.clustering import *
.
A more complex example: clustering words pairs
Content of file2
:
посещать/посетить
разделять
разбираться/разобраться
выделиться/выделить
изменять/изменить
выделяться/выделять
python3.8 clustering.py --input file2 --criterias STEM TRANS --merge --are-pairs
посещать/посетить
разделять
выделяться/выделять
выделиться/выделить
разбираться/разобраться
изменять/изменить
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for RussianWordsClusters-0.0.10.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae816d73af642c6fd36fd49f5ab6fda47f84d9307832ee2f71bc44233db8f727 |
|
MD5 | 196a73873caa8f5396d4dfbad366c06e |
|
BLAKE2b-256 | 1c03ca46f51f54485929bfad60ee5d1586f06e0fcb413e6e372a3abaeacead36 |
Hashes for RussianWordsClusters-0.0.10-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd7760ccbd4b59a21ea5e2fc7b49006b3be8f7c2f5d2d7071f03b18280d69f16 |
|
MD5 | 0b58085a246aecda6e40dc830645189f |
|
BLAKE2b-256 | 96b2f2c1ea99ca1f8f60e3eef7669f411416ff39f228b74ff622eada854f18bd |