Abydos NLP/IR library
Project description
Abydos
CI & Test Status |
|
Code Quality |
|
Dependencies |
|
Local Analysis |
|
Usage |
|
Contribution |
|
PyPI |
|
conda-forge |
Abydos is a library of phonetic algorithms, string distance measures & metrics, stemmers, and string fingerprinters including:
- Phonetic algorithms
Robert C. Russell’s Index
American Soundex
Refined Soundex
Daitch-Mokotoff Soundex
Kölner Phonetik
NYSIIS
Match Rating Algorithm
Metaphone
Double Metaphone
Caverphone
Alpha Search Inquiry System
Fuzzy Soundex
Phonex
Phonem
Phonix
SfinxBis
phonet
Standardized Phonetic Frequency Code
Statistics Canada
Lein
Roger Root
Oxford Name Compression Algorithm (ONCA)
Eudex phonetic hash
Haase Phonetik
Reth-Schek Phonetik
FONEM
Parmar-Kumbharana
Davidson’s Consonant Code
SoundD
PSHP Soundex/Viewex Coding
an early version of Henry Code
Norphone
Dolby Code
Phonetic Spanish
Spanish Metaphone
MetaSoundex
SoundexBR
NRL English-to-phoneme
Beider-Morse Phonetic Matching
- String distance metrics
Levenshtein distance
Optimal String Alignment distance
Levenshtein-Damerau distance
Hamming distance
Tversky index
Sørensen–Dice coefficient & distance
Jaccard similarity coefficient & distance
overlap similarity & distance
Tanimoto coefficient & distance
Minkowski distance & similarity
Manhattan distance & similarity
Euclidean distance & similarity
Chebyshev distance
cosine similarity & distance
Jaro distance
Jaro-Winkler distance (incl. the strcmp95 algorithm variant)
Longest common substring
Ratcliff-Obershelp similarity & distance
Match Rating Algorithm similarity
Normalized Compression Distance (NCD) & similarity
Monge-Elkan similarity & distance
Matrix similarity
Needleman-Wunsch score
Smither-Waterman score
Gotoh score
Length similarity
Prefix, Suffix, and Identity similarity & distance
Modified Language-Independent Product Name Search (MLIPNS) similarity & distance
Bag distance
Editex distance
Eudex distances
Sift4 distance
Baystat distance & similarity
Typo distance
Indel distance
Synoname
- Stemmers
the Lovins stemmer
the Porter and Porter2 (Snowball English) stemmers
Snowball stemmers for German, Dutch, Norwegian, Swedish, and Danish
CLEF German, German plus, and Swedish stemmers
Caumann’s German stemmer
UEA-Lite Stemmer
Paice-Husk Stemmer
Schinke Latin stemmer
S stemmer
- String Fingerprints
string fingerprint
q-gram fingerprint
phonetic fingerprint
Pollock & Zomora’s skeleton key
Pollock & Zomora’s omission key
Cisłak & Grabowski’s occurrence fingerprint
Cisłak & Grabowski’s occurrence halved fingerprint
Cisłak & Grabowski’s count fingerprint
Cisłak & Grabowski’s position fingerprint
Synoname Toolcode
Installation
Required libraries:
Numpy
Six
Recommended libraries:
PylibLZMA (Python 2 only–for LZMA compression string distance metric)
To install Abydos (master) from Github source:
git clone https://github.com/chrislit/abydos.git --recursive cd abydos python setup install
If your default python command calls Python 2.7 but you want to install for Python 3, you may instead need to call:
python3 setup install
To install Abydos (latest release) from PyPI using pip:
pip install abydos
To install from conda-forge:
conda install abydos
It should run on Python 2.7 and Python 3.3-3.7.
Testing & Contributing
To run the whole test-suite just call tox:
tox
The tox setup has the following environments: py27, py36, doctest, py27-regression, py36-regression, pylint, pycodestyle, flake8, doc8, badges, docs, py27-fuzz, & py36-fuzz. So if only want to generate documentation (in HTML, EPUB, & PDF formats), just call:
tox -e docs
In order to only run & generate Flake8 reports, call:
tox -e flake8
Contributions such as bug reports, PRs, suggestions, desired new features, etc. are welcome through the Github Issues & Pull requests.
Release History
0.3.5 (2018-10-31) cantankerous carl
doi:10.5281/zenodo.1463204
Version 0.3.5 focuses on refactoring the whole project. The API itself remains largely the same as in previous versions, but underlyingly modules have been split up. Essentially no new features are added (bugfixes aside) in this version.
Changes:
Refactored library and tests into smaller modules
Broke compression distances (NCD) out into separate functions
Adopted Black code style
Added pyproject.toml to use Poetry for packaging (but will continue using setuptools and setup.py for the present)
Minor bug fixes
0.3.0 (2018-10-15) carl
doi:10.5281/zenodo.1462443
Version 0.3.0 focuses on additional phonetic algorithms, but does add numerous distance measures, fingerprints, and even a few stemmers. Another focus was getting everything to build again (including docs) and to move to more standard modern tools (flake8, tox, etc.).
Changes:
Fixed implementation of Bag distance
Updated BMPM to version 3.10
Fixed Sphinx documentation on readthedocs.org
Split string fingerprints out of clustering into their own module
Added support for q-grams to skip-n characters
- New phonetic algorithms:
Statistics Canada
Lein
Roger Root
Oxford Name Compression Algorithm (ONCA)
Eudex phonetic hash
Haase Phonetik
Reth-Schek Phonetik
FONEM
Parmar-Kumbharana
Davidson’s Consonant Code
SoundD
PSHP Soundex/Viewex Coding
an early version of Henry Code
Norphone
Dolby Code
Phonetic Spanish
Spanish Metaphone
MetaSoundex
SoundexBR
NRL English-to-phoneme
- New string fingerprints:
Cisłak & Grabowski’s occurrence fingerprint
Cisłak & Grabowski’s occurrence halved fingerprint
Cisłak & Grabowski’s count fingerprint
Cisłak & Grabowski’s position fingerprint
Synoname Toolcode
- New distance measures:
Minkowski distance & similarity
Manhattan distance & similarity
Euclidean distance & similarity
Chebyshev distance & similarity
Eudex distances
Sift4 distance
Baystat distance & similarity
Typo distance
Indel distance
Synoname
- New stemmers:
UEA-Lite Stemmer
Paice-Husk Stemmer
Schinke Latin stemmer
Eliminated ._compat submodule in favor of six
Transitioned from PEP8 to flake8, etc.
Phonetic algorithms now consistently use max_length=-1 to indicate that there should be no length limit
Added example notebooks in binder directory
0.2.0 (2015-05-27) berthold
Added Caumanns’ German stemmer
Added Lovins’ English stemmer
Updated Beider-Morse Phonetic Matching to 3.04
Added Sphinx documentation
0.1.1 (2015-05-12) albrecht
First Beta release to PyPI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file abydos-0.3.5.tar.gz
.
File metadata
- Download URL: abydos-0.3.5.tar.gz
- Upload date:
- Size: 205.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dbefc26ee985ca8387125af3e458699f8ac088735fc3e923bab0a34227bee902 |
|
MD5 | 8abe634f43588268409f87b3abbc56c0 |
|
BLAKE2b-256 | a7bd1003074e655d1fc3dc5d08917f28112ac6c2fa65ecf80229a30cba86fc5b |
Provenance
File details
Details for the file abydos-0.3.5-py2.py3-none-any.whl
.
File metadata
- Download URL: abydos-0.3.5-py2.py3-none-any.whl
- Upload date:
- Size: 254.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93ab542c9f8d5b6efe492e95b49b39957d5a79c7d4e3ede6d8228036ab96b3a1 |
|
MD5 | 172e9a2d57068c5e3d77b39886b88341 |
|
BLAKE2b-256 | 337d2307634d19145026cf8046164e4895f0f57af26e1df6f390a1853e9d1daa |