Fast string similarity library for Python - Rust bindings for ELID
Project description
ELID - Efficient Levenshtein and String Similarity Library
A fast, zero-dependency Rust library for computing string similarity metrics with bindings for Python, JavaScript (WASM), and C.
Algorithms
| Algorithm | Type | Best For |
|---|---|---|
| Levenshtein | Edit distance | General-purpose comparison, spell checking |
| Normalized Levenshtein | Similarity (0-1) | When you need a percentage match |
| Jaro | Similarity (0-1) | Short strings |
| Jaro-Winkler | Similarity (0-1) | Names and record linkage |
| Hamming | Distance | Fixed-length strings, DNA, error codes |
| OSA | Edit distance | Typo detection (counts transpositions) |
| SimHash | LSH fingerprint | Database-queryable similarity, near-duplicate detection |
| Best Match | Composite (0-1) | When unsure which algorithm fits |
Installation
Rust
[dependencies]
elid = "0.2.1"
Python
pip install elid
JavaScript (WASM)
npm install elid-wasm
C/C++
Build with cargo build --release --features ffi to get libelid.so and elid.h.
Quick Start
use elid::*;
// Edit distance
let distance = levenshtein("kitten", "sitting"); // 3
// Normalized similarity (0.0 to 1.0)
let similarity = normalized_levenshtein("hello", "hallo"); // 0.8
// Name matching
let similarity = jaro_winkler("Martha", "Marhta"); // 0.961
// SimHash for database queries
let hash = simhash("iPhone 14");
let sim = simhash_similarity("iPhone 14", "iPhone 15"); // ~0.92
// Find best match in a list
let candidates = vec!["apple", "application", "apply"];
let (idx, score) = find_best_match("app", &candidates);
Python
import elid
elid.levenshtein("kitten", "sitting") # 3
elid.jaro_winkler("martha", "marhta") # 0.961
elid.simhash_similarity("iPhone 14", "iPhone 15") # 0.922
JavaScript
import init, { levenshtein, jaroWinkler, simhashSimilarity } from 'elid';
await init();
levenshtein("kitten", "sitting"); // 3
jaroWinkler("martha", "marhta"); // 0.961
simhashSimilarity("iPhone 14", "iPhone 15"); // 0.922
Configuration
Use SimilarityOpts for case-insensitive or whitespace-trimmed comparisons:
use elid::{levenshtein_with_opts, SimilarityOpts};
let opts = SimilarityOpts {
case_sensitive: false,
trim_whitespace: true,
..Default::default()
};
let distance = levenshtein_with_opts(" HELLO ", "hello", &opts); // 0
Performance
- Zero external dependencies for core algorithms
- O(min(m,n)) space-optimized Levenshtein
- 1.4M+ string comparisons per second (Python benchmarks)
- ~96KB WASM binary
Building
git clone https://github.com/ZachHandley/ELID.git
cd ELID
cargo build --release
cargo test
cargo bench
cargo run --example basic_usage
License
Dual-licensed under MIT or Apache-2.0 at your option.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file elid-0.2.1-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: elid-0.2.1-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 280.5 kB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09b65d6e1a4241a3fa99ff6def54d5d68d426fac7eebb290d46f51dc1bdbe14d
|
|
| MD5 |
53bbca30db68e08c9d131deee5fc6a7d
|
|
| BLAKE2b-256 |
36f2afcea7d750936c9b459c1dabb787ed0af0d60b159e8f89eb42349f3d9e15
|