A spelling similarity measure for cognate identification.
spsim is a Python 3 module that implements a spelling similarity measure for identifying cognates across languages, taking into account spelling differences that are characteristic of each language pair, as described in [Gomes2011].
Note: in the examples below, $ denotes the Bash prompt and a Linux, MacOs or similar *nix environment is assumed.
Install as usual:
$ pip3 install spsim
Example command line usage:
$ # first let's get some pairs of words that may be cognates: $ wget http://research.variancia.com/spsim/maybe_enpt.txt $ cat maybe_enpt.txt pharmacy farmácia arithmetic aritmética $ # If we don't give any example cognates, SpSim will be equivalent to $ # 1 - edit_distance / max_len_of_strings $ # Note that by default spsim matches accentuated characters, i.e. a == á $ echo "" > empty.txt $ spsim empty.txt maybe_enpt.txt pharmacy farmácia 0.5 arithmetic aritmética 0.8 $ now let's get some example cognates: $ wget http://research.variancia.com/spsim/examples_enpt.txt $ cat examples_enpt.txt alcohol álcool alpha alfa anomaly anomalia mathematics matemática methodology metodologia metric métrica morphine morfina photos fotos $ # by giving these examples to spsim, it will learn to ignore certain differences: $ spsim examples_enpt.txt maybe_enpt.txt pharmacy farmácia 1.0 arithmetic aritmética 1.0
|[Gomes2011]||Measuring Spelling Similarity for Cognate Identification, Luís Gomes and Gabriel Pereira Lopes in Progress in Artificial Intelligence, 15th Portuguese Conference in Artificial Intelligence, EPIA 2011, Lisboa, Portugal, October 2011, http://www.springerlink.com/content/gtl56j3l06906020/|
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|File Name & Hash SHA256 Hash Help||Version||File Type||Upload Date|
(8.6 kB) Copy SHA256 Hash SHA256
|py3||Wheel||Jul 26, 2017|
(6.0 kB) Copy SHA256 Hash SHA256
|–||Source||Jul 26, 2017|