Skip to main content

Abydos NLP/IR library

Project description

Abydos

CI Status

Travis-CI Build Status Circle-CI Build Status AppVeyor Build Status Semaphore Build Status

Code Quality

Code Climate Scrutinizer Codacy CodeFactor Ebert

Dependencies

Requirements Status Known Vulnerabilities Updates

Test Coverage

Coverage Status

Local Analysis

Pylint Score pycodestyle Errors flake8 Errors

Usage

Documentation Status Binder License: GPL v3 Libraries.io SourceRank Zenodo

Contribution

CII Best Practices 'Waffle.io - Columns and their card count' OpenHUB

PyPI

PyPI PyPI versions

conda-forge

conda-forge conda-forge downloads conda-forge platforms


abydos

Abydos NLP/IR library
Copyright 2014-2018 by Christopher C. Little

Abydos is a library of phonetic algorithms, string distance measures & metrics, stemmers, and string fingerprinters including:

  • Phonetic algorithms
    • Robert C. Russell’s Index

    • American Soundex

    • Refined Soundex

    • Daitch-Mokotoff Soundex

    • Kölner Phonetik

    • NYSIIS

    • Match Rating Algorithm

    • Metaphone

    • Double Metaphone

    • Caverphone

    • Alpha Search Inquiry System

    • Fuzzy Soundex

    • Phonex

    • Phonem

    • Phonix

    • SfinxBis

    • phonet

    • Standardized Phonetic Frequency Code

    • Statistics Canada

    • Lein

    • Roger Root

    • Oxford Name Compression Algorithm (ONCA)

    • Eudex phonetic hash

    • Haase Phonetik

    • Reth-Schek Phonetik

    • FONEM

    • Parmar-Kumbharana

    • Davidson’s Consonant Code

    • SoundD

    • PSHP Soundex/Viewex Coding

    • an early version of Henry Code

    • Norphone

    • Dolby Code

    • Phonetic Spanish

    • Spanish Metaphone

    • MetaSoundex

    • SoundexBR

    • NRL English-to-phoneme

    • Beider-Morse Phonetic Matching

  • String distance metrics
    • Levenshtein distance

    • Optimal String Alignment distance

    • Levenshtein-Damerau distance

    • Hamming distance

    • Tversky index

    • Sørensen–Dice coefficient & distance

    • Jaccard similarity coefficient & distance

    • overlap similarity & distance

    • Tanimoto coefficient & distance

    • Minkowski distance & similarity

    • Manhattan distance & similarity

    • Euclidean distance & similarity

    • Chebyshev distance

    • cosine similarity & distance

    • Jaro distance

    • Jaro-Winkler distance (incl. the strcmp95 algorithm variant)

    • Longest common substring

    • Ratcliff-Obershelp similarity & distance

    • Match Rating Algorithm similarity

    • Normalized Compression Distance (NCD) & similarity

    • Monge-Elkan similarity & distance

    • Matrix similarity

    • Needleman-Wunsch score

    • Smither-Waterman score

    • Gotoh score

    • Length similarity

    • Prefix, Suffix, and Identity similarity & distance

    • Modified Language-Independent Product Name Search (MLIPNS) similarity & distance

    • Bag distance

    • Editex distance

    • Eudex distances

    • Sift4 distance

    • Baystat distance & similarity

    • Typo distance

    • Indel distance

    • Synoname

  • Stemmers
    • the Lovins stemmer

    • the Porter and Porter2 (Snowball English) stemmers

    • Snowball stemmers for German, Dutch, Norwegian, Swedish, and Danish

    • CLEF German, German plus, and Swedish stemmers

    • Caumann’s German stemmer

    • UEA-Lite Stemmer

    • Paice-Husk Stemmer

    • Schinke Latin stemmer

    • S stemmer

  • String Fingerprints
    • string fingerprint

    • q-gram fingerprint

    • phonetic fingerprint

    • Pollock & Zomora’s skeleton key

    • Pollock & Zomora’s omission key

    • Cisłak & Grabowski’s occurrence fingerprint

    • Cisłak & Grabowski’s occurrence halved fingerprint

    • Cisłak & Grabowski’s count fingerprint

    • Cisłak & Grabowski’s position fingerprint

    • Synoname Toolcode


Installation

Required libraries:

  • Numpy

  • Six

Recommended libraries:

  • PylibLZMA (Python 2 only–for LZMA compression string distance metric)

To install Abydos (master) from Github source:

git clone https://github.com/chrislit/abydos.git --recursive
cd abydos
python setup install

If your default python command calls Python 2.7 but you want to install for Python 3, you may instead need to call:

python3 setup install

To install Abydos (latest release) from PyPI using pip:

pip install abydos

To install from conda-forge:

conda install abydos

It should run on Python 2.7 and Python 3.3-3.7.

Testing & Contributing

To run the whole test-suite just call tox:

tox

The tox setup has the following environments: py27, py36, doctest, py27-regression, py36-regression, pylint, pycodestyle, flake8, doc8, badges, docs, py27-fuzz, & py36-fuzz. So if only want to generate documentation (in HTML, EPUB, & PDF formats), just call:

tox -e docs

In order to only run & generate Flake8 reports, call:

tox -e flake8

Contributions such as bug reports, PRs, suggestions, desired new features, etc. are welcome through the Github Issues & Pull requests.

Release History

0.3.0 (2018-10-15)

  • Fixed implementation of Bag distance

  • Updated BMPM to version 3.10

  • Fixed Sphinx documentation on readthedocs.org

  • Split string fingerprints out of clustering into their own module

  • Added support for q-grams to skip-n characters

  • New phonetic algorithms:
    • Statistics Canada

    • Lein

    • Roger Root

    • Oxford Name Compression Algorithm (ONCA)

    • Eudex phonetic hash

    • Haase Phonetik

    • Reth-Schek Phonetik

    • FONEM

    • Parmar-Kumbharana

    • Davidson’s Consonant Code

    • SoundD

    • PSHP Soundex/Viewex Coding

    • an early version of Henry Code

    • Norphone

    • Dolby Code

    • Phonetic Spanish

    • Spanish Metaphone

    • MetaSoundex

    • SoundexBR

    • NRL English-to-phoneme

  • New string fingerprints:
    • Cisłak & Grabowski’s occurrence fingerprint

    • Cisłak & Grabowski’s occurrence halved fingerprint

    • Cisłak & Grabowski’s count fingerprint

    • Cisłak & Grabowski’s position fingerprint

    • Synoname Toolcode

  • New distance measures:
    • Minkowski distance & similarity

    • Manhattan distance & similarity

    • Euclidean distance & similarity

    • Chebyshev distance & similarity

    • Eudex distances

    • Sift4 distance

    • Baystat distance & similarity

    • Typo distance

    • Indel distance

    • Synoname

  • New stemmers:
    • UEA-Lite Stemmer

    • Paice-Husk Stemmer

    • Schinke Latin stemmer

  • Eliminated ._compat submodule in favor of six

  • Transitioned from PEP8 to flake8, etc.

  • Phonetic algorithms now consistently use max_length=-1 to indicate that there should be no length limit

  • Added example notebooks in binder directory

0.2.0 (2015-05-27)

  • Added Caumanns’ German stemmer

  • Added Lovins’ English stemmer

  • Updated Beider-Morse Phonetic Matching to 3.04

  • Added Sphinx documentation

0.1.1 (2015-05-12)

  • First Beta release to PyPI

Authors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

abydos-0.3.0.tar.gz (196.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

abydos-0.3.0-py3.7.egg (359.6 kB view details)

Uploaded Egg

abydos-0.3.0-py3.6.egg (384.2 kB view details)

Uploaded Egg

abydos-0.3.0-py3.5.egg (378.5 kB view details)

Uploaded Egg

abydos-0.3.0-py3.4.egg (380.9 kB view details)

Uploaded Egg

abydos-0.3.0-py3.3.egg (381.7 kB view details)

Uploaded Egg

abydos-0.3.0-py2.py3-none-any.whl (188.7 kB view details)

Uploaded Python 2Python 3

abydos-0.3.0-py2.7.egg (373.4 kB view details)

Uploaded Egg

File details

Details for the file abydos-0.3.0.tar.gz.

File metadata

  • Download URL: abydos-0.3.0.tar.gz
  • Upload date:
  • Size: 196.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6

File hashes

Hashes for abydos-0.3.0.tar.gz
Algorithm Hash digest
SHA256 af416432d00fcaa2ba1f8eb1befea17600b7b8d8b62fd2f5cc1dd947bca9ebcd
MD5 214e6d7a50027ca9854f3629e3bf17d6
BLAKE2b-256 5f7e57d343e3035387acca96439b5317b53987ef21fcfc0e6c708af51680d844

See more details on using hashes here.

File details

Details for the file abydos-0.3.0-py3.7.egg.

File metadata

  • Download URL: abydos-0.3.0-py3.7.egg
  • Upload date:
  • Size: 359.6 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6

File hashes

Hashes for abydos-0.3.0-py3.7.egg
Algorithm Hash digest
SHA256 43553247d585850cadb061b14dec093a90af9f36e4b466fec052161cab8579de
MD5 ab3a781600ed46679561789c5bef18f6
BLAKE2b-256 c84cad1ba9db15e60fb808c497ab9ffad551e5d78c386393f89db2c6b8ebd6ea

See more details on using hashes here.

File details

Details for the file abydos-0.3.0-py3.6.egg.

File metadata

  • Download URL: abydos-0.3.0-py3.6.egg
  • Upload date:
  • Size: 384.2 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6

File hashes

Hashes for abydos-0.3.0-py3.6.egg
Algorithm Hash digest
SHA256 28a5a8cbaf0ba2cf1740b200cac28d71aed1585d8e716ddbc3e58cf167b487e8
MD5 52e23cc7cd9cad7a58adf793b94d9313
BLAKE2b-256 ae677cc9c728cfb127bc2713b8e9a77e00e9de8d69d542eac7cd91d289428be6

See more details on using hashes here.

File details

Details for the file abydos-0.3.0-py3.5.egg.

File metadata

  • Download URL: abydos-0.3.0-py3.5.egg
  • Upload date:
  • Size: 378.5 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6

File hashes

Hashes for abydos-0.3.0-py3.5.egg
Algorithm Hash digest
SHA256 4296c91d03036546078d73c5405786e1cfb036590e00cf27c14e3257fb29f78a
MD5 157b09dfbcc5251a2c6636f8223d8e12
BLAKE2b-256 837eb4de2c994be7f07b9b8be3ffc11544e4edd62e7f3aba7cf9658013b0da52

See more details on using hashes here.

File details

Details for the file abydos-0.3.0-py3.4.egg.

File metadata

  • Download URL: abydos-0.3.0-py3.4.egg
  • Upload date:
  • Size: 380.9 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6

File hashes

Hashes for abydos-0.3.0-py3.4.egg
Algorithm Hash digest
SHA256 76109a42091ff4974a630b40cabb07b5b801ce5bff9199c3fbc8ca134b5ac10f
MD5 cc70787c5ef6172226b53611401e5c30
BLAKE2b-256 ede85130e6424b6e0ed6b3e02038055e35c13906c70b3c76f6e86f42169e416f

See more details on using hashes here.

File details

Details for the file abydos-0.3.0-py3.3.egg.

File metadata

  • Download URL: abydos-0.3.0-py3.3.egg
  • Upload date:
  • Size: 381.7 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6

File hashes

Hashes for abydos-0.3.0-py3.3.egg
Algorithm Hash digest
SHA256 ba29060ad5b93fc60dd1adee1a3d9c0f1168b06dd060bd0317e6cff6a4030640
MD5 6875f231fa1b5cad52069208a3fc5108
BLAKE2b-256 16c17591e848b42988059ca762cb2b74f716e6626958bd914ba325b68e66e574

See more details on using hashes here.

File details

Details for the file abydos-0.3.0-py2.py3-none-any.whl.

File metadata

  • Download URL: abydos-0.3.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 188.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6

File hashes

Hashes for abydos-0.3.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 84d4c7b71e4222bbf82372fe5df914e07ceca7fba5c6fc67c4bf2f66789bfbb5
MD5 a0e9de7ad95479894d68306919c04ab7
BLAKE2b-256 4b35bb1a6f1cda0fd42c7024e533b34bee3755c7e0e0cad7395840bc3599149f

See more details on using hashes here.

File details

Details for the file abydos-0.3.0-py2.7.egg.

File metadata

  • Download URL: abydos-0.3.0-py2.7.egg
  • Upload date:
  • Size: 373.4 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6

File hashes

Hashes for abydos-0.3.0-py2.7.egg
Algorithm Hash digest
SHA256 61313035f3739461ead63a8b1223557c18ffdef303ca554ff1a515076a7c1081
MD5 2ce04cc1ae5ce455ca61f486081d5296
BLAKE2b-256 cbdc764116435459036230d4621f9c25279a61024f9da18258af8e144fa61aad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page