Skip to main content

Fast string similarity library for Python - Rust bindings for ELID

Project description

ELID - Efficient Levenshtein and String Similarity Library

CI License

A fast, zero-dependency Rust library for computing string similarity metrics with bindings for Python, JavaScript (WASM), and C.

Algorithms

Algorithm Type Best For
Levenshtein Edit distance General-purpose comparison, spell checking
Normalized Levenshtein Similarity (0-1) When you need a percentage match
Jaro Similarity (0-1) Short strings
Jaro-Winkler Similarity (0-1) Names and record linkage
Hamming Distance Fixed-length strings, DNA, error codes
OSA Edit distance Typo detection (counts transpositions)
SimHash LSH fingerprint Database-queryable similarity, near-duplicate detection
Best Match Composite (0-1) When unsure which algorithm fits

Installation

Rust

[dependencies]
elid = "0.2.1"

Python

pip install elid

JavaScript (WASM)

npm install elid-wasm

C/C++

Build with cargo build --release --features ffi to get libelid.so and elid.h.

Quick Start

use elid::*;

// Edit distance
let distance = levenshtein("kitten", "sitting"); // 3

// Normalized similarity (0.0 to 1.0)
let similarity = normalized_levenshtein("hello", "hallo"); // 0.8

// Name matching
let similarity = jaro_winkler("Martha", "Marhta"); // 0.961

// SimHash for database queries
let hash = simhash("iPhone 14");
let sim = simhash_similarity("iPhone 14", "iPhone 15"); // ~0.92

// Find best match in a list
let candidates = vec!["apple", "application", "apply"];
let (idx, score) = find_best_match("app", &candidates);

Python

import elid

elid.levenshtein("kitten", "sitting")  # 3
elid.jaro_winkler("martha", "marhta")  # 0.961
elid.simhash_similarity("iPhone 14", "iPhone 15")  # 0.922

JavaScript

import init, { levenshtein, jaroWinkler, simhashSimilarity } from 'elid';

await init();
levenshtein("kitten", "sitting");  // 3
jaroWinkler("martha", "marhta");   // 0.961
simhashSimilarity("iPhone 14", "iPhone 15");  // 0.922

Configuration

Use SimilarityOpts for case-insensitive or whitespace-trimmed comparisons:

use elid::{levenshtein_with_opts, SimilarityOpts};

let opts = SimilarityOpts {
    case_sensitive: false,
    trim_whitespace: true,
    ..Default::default()
};
let distance = levenshtein_with_opts("  HELLO  ", "hello", &opts); // 0

Performance

  • Zero external dependencies for core algorithms
  • O(min(m,n)) space-optimized Levenshtein
  • 1.4M+ string comparisons per second (Python benchmarks)
  • ~96KB WASM binary

Building

git clone https://github.com/ZachHandley/ELID.git
cd ELID

cargo build --release
cargo test
cargo bench
cargo run --example basic_usage

License

Dual-licensed under MIT or Apache-2.0 at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

elid-0.2.1-cp312-cp312-manylinux_2_34_x86_64.whl (280.5 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

File details

Details for the file elid-0.2.1-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for elid-0.2.1-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 09b65d6e1a4241a3fa99ff6def54d5d68d426fac7eebb290d46f51dc1bdbe14d
MD5 53bbca30db68e08c9d131deee5fc6a7d
BLAKE2b-256 36f2afcea7d750936c9b459c1dabb787ed0af0d60b159e8f89eb42349f3d9e15

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page