Skip to main content

Fast string similarity library for Python - Rust bindings for ELID

Project description

ELID - Efficient Levenshtein and String Similarity Library

CI License

A fast, zero-dependency Rust library for computing string similarity metrics with bindings for Python, JavaScript (WASM), and C.

Algorithms

Algorithm Type Best For
Levenshtein Edit distance General-purpose comparison, spell checking
Normalized Levenshtein Similarity (0-1) When you need a percentage match
Jaro Similarity (0-1) Short strings
Jaro-Winkler Similarity (0-1) Names and record linkage
Hamming Distance Fixed-length strings, DNA, error codes
OSA Edit distance Typo detection (counts transpositions)
SimHash LSH fingerprint Database-queryable similarity, near-duplicate detection
Best Match Composite (0-1) When unsure which algorithm fits

Installation

Rust

[dependencies]
elid = "0.1.0"

Python

pip install elid

JavaScript (WASM)

npm install elid-wasm

C/C++

Build with cargo build --release --features ffi to get libelid.so and elid.h.

Quick Start

use elid::*;

// Edit distance
let distance = levenshtein("kitten", "sitting"); // 3

// Normalized similarity (0.0 to 1.0)
let similarity = normalized_levenshtein("hello", "hallo"); // 0.8

// Name matching
let similarity = jaro_winkler("Martha", "Marhta"); // 0.961

// SimHash for database queries
let hash = simhash("iPhone 14");
let sim = simhash_similarity("iPhone 14", "iPhone 15"); // ~0.92

// Find best match in a list
let candidates = vec!["apple", "application", "apply"];
let (idx, score) = find_best_match("app", &candidates);

Python

import elid

elid.levenshtein("kitten", "sitting")  # 3
elid.jaro_winkler("martha", "marhta")  # 0.961
elid.simhash_similarity("iPhone 14", "iPhone 15")  # 0.922

JavaScript

import init, { levenshtein, jaroWinkler, simhashSimilarity } from 'elid';

await init();
levenshtein("kitten", "sitting");  // 3
jaroWinkler("martha", "marhta");   // 0.961
simhashSimilarity("iPhone 14", "iPhone 15");  // 0.922

Configuration

Use SimilarityOpts for case-insensitive or whitespace-trimmed comparisons:

use elid::{levenshtein_with_opts, SimilarityOpts};

let opts = SimilarityOpts {
    case_sensitive: false,
    trim_whitespace: true,
    ..Default::default()
};
let distance = levenshtein_with_opts("  HELLO  ", "hello", &opts); // 0

Performance

  • Zero external dependencies for core algorithms
  • O(min(m,n)) space-optimized Levenshtein
  • 1.4M+ string comparisons per second (Python benchmarks)
  • ~96KB WASM binary

Building

git clone https://forge.blackleafdigital.com/BlackLeafDigital/ELID.git
cd ELID

cargo build --release
cargo test
cargo bench
cargo run --example basic_usage

License

Dual-licensed under MIT or Apache-2.0 at your option.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

elid-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl (280.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

File details

Details for the file elid-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for elid-0.1.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 07d513f46a6fccdd7db6736530cf7a24f307455cb0db091edad74039b29dca58
MD5 7f4830918fa366891f82771780fe662b
BLAKE2b-256 d3def0e8d54bc8d381d89cc5e744ad5ab3d6ca78c435ddb2e535f6df50d4da17

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page