Skip to main content

Damererau-Levenshtein implementation with Rust for fast performance.

Project description

Rust implementation of the Damerau-Levenshtein distance

Damerau-Levenshtein implementation in Rust as Python package. You can use this package if you need to calculate a distance metric for lists of integers or strings, and you need high-performance.

This package is based on the C implementation pyxDamerauLevenshtein.

Install

pip install pyrsdameraulevenshtein

Use

import pyrsdameraulevenshtein

distance = pyrsdameraulevenshtein.distance_int([1, 2, 3], [1, 3])
# distance = 1
normalized_distance = pyrsdameraulevenshtein.normalized_distance_int([1, 2, 3], [1, 3])
# normalized_distance = 0.33
similarity = pyrsdameraulevenshtein.similarity_int([1, 2, 3], [1, 3])
# similarity = 0.66
distance = pyrsdameraulevenshtein.distance_str(["A", "B", "C"], ["A", "C"])
# distance = 1
normalized_distance = pyrsdameraulevenshtein.normalized_distance_str(["A", "B", "C"], ["A", "C"])
# normalized_distance = 0.33
similarity = pyrsdameraulevenshtein.similarity_str(["A", "B", "C"], ["A", "C"])
# similarity = 0.66
distance = pyrsdameraulevenshtein.distance_unicode("ABC", "AC")
# distance = 1
normalized_distance = pyrsdameraulevenshtein.normalized_distance_unicode("ABC", "AC")
# normalized_distance = 0.33
similarity = pyrsdameraulevenshtein.similarity_unicode("ABC", "AC")
# similarity = 0.66

Get started

  1. First, create a virtual python environment.
  2. Install packages pip install -r requirements.txt
  3. Create the Rust binary
    1. Full performance: maturin build --release and pip install target/wheels/*.whl
    2. Develop version: maturin develop
  4. Run the tests python tests/DamerauLevenshteinTest.py

Performance

Speed comparison with the C implementation pyxDamerauLevenshtein results in 4 times faster performance.

import random
import time
import pyrsdameraulevenshtein
from pyxdameraulevenshtein import damerau_levenshtein_distance

n = 100000
x = 10
a_lists = [random.sample(list(range(x)), k=x, counts=[x for i in range(x)]) for i in range(n)]
b_lists = [random.sample(list(range(x)), k=x, counts=[x for i in range(x)]) for i in range(n)]

tic = time.perf_counter()
for a, b in zip(a_lists, b_lists):
    result = pyrsdameraulevenshtein.distance_int(a, b)
toc = time.perf_counter()
print(f"{toc - tic:0.4f} seconds, RUST implementation")
# 0.0864 seconds, RUST implementation

tic = time.perf_counter()
for a, b in zip(a_lists, b_lists):
    result = damerau_levenshtein_distance(a, b)
toc = time.perf_counter()
print(f"{toc - tic:0.4f} seconds, Gold standard - pyxdameraulevenshtein implementation")
# 0.3195 seconds, Gold standard - pyxdameraulevenshtein implementation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyrsdameraulevenshtein-1.0.1.tar.gz (17.7 kB view hashes)

Uploaded Source

Built Distributions

pyrsdameraulevenshtein-1.0.1-cp310-none-win_amd64.whl (129.7 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

pyrsdameraulevenshtein-1.0.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.5+ x86-64

pyrsdameraulevenshtein-1.0.1-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (451.0 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

pyrsdameraulevenshtein-1.0.1-cp39-none-win_amd64.whl (129.7 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

pyrsdameraulevenshtein-1.0.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.5+ x86-64

pyrsdameraulevenshtein-1.0.1-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (450.8 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64) macOS 10.9+ x86-64 macOS 11.0+ ARM64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page