Polars extension for string similarity
Project description
String Similarity Measures for Polars
This package provides python bindings to compute various string similarity measures directly on a polars dataframe. All string similarity measures are implemented in rust and computed in parallel.
The similarity measures that have been implemented are:
- Levenshtein
- Jaro
- Jaro-Winkler
- Jaccard
- Sørensen-Dice
Each similarity measure returns a value normalized between 0.0 and 1.0 (inclusive), where 0.0 indicates the inputs are maximally different and 1.0 means the strings are maximally similar.
Installing the Library
With pip
pip install polars-strsim
From Source
To build and install this library from source, first ensure you have cargo installed. You will also need maturin, which you can install via pip install 'maturin[patchelf]'
polars-strsim can then be installed in your current python environment by running maturin develop --release
Using the Library
Input:
import polars as pl
from polars_strsim import levenshtein, jaro, jaro_winkler, jaccard, sorensen_dice
df = pl.DataFrame(
{
"name_a": ["phillips", "phillips", "" , "", None , None],
"name_b": ["phillips", "philips" , "phillips", "", "phillips", None],
}
).with_columns(
levenshtein=levenshtein("name_a", "name_b"),
jaro=jaro("name_a", "name_b"),
jaro_winkler=jaro_winkler("name_a", "name_b"),
jaccard=jaccard("name_a", "name_b"),
sorensen_dice=sorensen_dice("name_a", "name_b"),
)
with pl.Config(ascii_tables=True):
print(df)
Output:
shape: (6, 7)
+----------+----------+-------------+----------+--------------+---------+---------------+
| name_a | name_b | levenshtein | jaro | jaro_winkler | jaccard | sorensen_dice |
| --- | --- | --- | --- | --- | --- | --- |
| str | str | f64 | f64 | f64 | f64 | f64 |
+=======================================================================================+
| phillips | phillips | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
| phillips | philips | 0.875 | 0.958333 | 0.975 | 0.875 | 0.933333 |
| | phillips | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| | | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
| null | phillips | null | null | null | null | null |
| null | null | null | null | null | null | null |
+----------+----------+-------------+----------+--------------+---------+---------------+
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file polars_strsim-0.2.3.tar.gz
.
File metadata
- Download URL: polars_strsim-0.2.3.tar.gz
- Upload date:
- Size: 31.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e92bc81c933867e3e812a7000a51bc830d78377e079e0fc98bb26ad022879e3 |
|
MD5 | 8041f1243e80e57a8e942a6c3897c80b |
|
BLAKE2b-256 | cd0a2f7dac45cfebc9372faabd183f52f05a7b5349722666f492056d64bc8e1f |
File details
Details for the file polars_strsim-0.2.3-cp38-abi3-win_amd64.whl
.
File metadata
- Download URL: polars_strsim-0.2.3-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 3.3 MB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12364251d8584cf0a42faf03a78c2ae5dc19a23fba74674ce3275d9b3475a193 |
|
MD5 | 651fce92976016f9f781c90e8ad3278a |
|
BLAKE2b-256 | b256e8ba294ed528a02bfaeecf83f8b732c3b230545f243d3b1b9044e72ef3f5 |
File details
Details for the file polars_strsim-0.2.3-cp38-abi3-win32.whl
.
File metadata
- Download URL: polars_strsim-0.2.3-cp38-abi3-win32.whl
- Upload date:
- Size: 2.9 MB
- Tags: CPython 3.8+, Windows x86
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33e9e760558dde60296dba2893bf5001afa9b47bfc5bfddd081fb75a6d571401 |
|
MD5 | f809380ce5a5fa47b1292685c11357d2 |
|
BLAKE2b-256 | 233d9d68fa5a272989d063f174c81f888a2e30886e55edba7c3209338a3aaa0f |
File details
Details for the file polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_x86_64.whl
.
File metadata
- Download URL: polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.8+, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 22e7c8afed562f7461e5c2e31c35184f024dd730f4477c5f3325d7beb74b5d52 |
|
MD5 | f35873aa0e02be45dc38394314cf2129 |
|
BLAKE2b-256 | e3293df8d42ba9269e99db76c309bf2c9da88b4bdc16b7239b594d725823f370 |
File details
Details for the file polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_i686.whl
.
File metadata
- Download URL: polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_i686.whl
- Upload date:
- Size: 4.2 MB
- Tags: CPython 3.8+, musllinux: musl 1.2+ i686
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f308588690b0a3bf4a4347d6ca7cdd4578f6ea6c290bd83572dcf89b06331853 |
|
MD5 | afa42deeb81eb37e080ed40fa9c2e463 |
|
BLAKE2b-256 | 2713b71b1f6af4d9dfc283dfafc881c8dcbb42e9706d77a63d91d2ba9bc18b85 |
File details
Details for the file polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_armv7l.whl
.
File metadata
- Download URL: polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_armv7l.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.8+, musllinux: musl 1.2+ ARMv7l
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bfcd90efdcb54016caa119ea8c651df27096f3b2f9a0fda0f2faa2b150861203 |
|
MD5 | a61617a9b36d01fd05e94b3396b45ad9 |
|
BLAKE2b-256 | e6bcdd7ae1edf72fbcbe261a39e19e29f1e3360addfe9e21bf18d09a6a9647d0 |
File details
Details for the file polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_aarch64.whl
.
File metadata
- Download URL: polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_aarch64.whl
- Upload date:
- Size: 3.9 MB
- Tags: CPython 3.8+, musllinux: musl 1.2+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dbb5afea1d06e8934d579e53deda6e883a59d28558336a339090bda50bf2022a |
|
MD5 | 7ea49d97ccd30034e180dd4520818286 |
|
BLAKE2b-256 | a4b48f8f25245d3669b11e2615e5ba1496675d3e798bc61e0b54506ae4fa3db6 |
File details
Details for the file polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 4.0 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a811fd2b169e2c4f732d087303d1038d84ba309f4debb7852f29f2487a3f8a7b |
|
MD5 | 85730d37a84eac64bc425a50ad2d7438 |
|
BLAKE2b-256 | 223b7b235b457075aa8ab63a251d33046301678f2af0040a1018c1659c5da941 |
File details
Details for the file polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
.
File metadata
- Download URL: polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
- Upload date:
- Size: 4.3 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ i686
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f05c6eaedc91ab657de84abc13796c3bf4e47ff304be10fb137db8db71fa9eb8 |
|
MD5 | 3ef389293c76ad12013a47e84ee9e8a8 |
|
BLAKE2b-256 | 6ebdc8fe9f8f9de7828eb9194d570d2288a5f3a3dfb1dae1b1ee1eae46db7026 |
File details
Details for the file polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
.
File metadata
- Download URL: polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
- Upload date:
- Size: 4.0 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ ARMv7l
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce09bd951b6302b1fa904cb5d22870dfcb4092b115501cadb6a0a94ae5c0ff1f |
|
MD5 | 89f7f96b3d3427c787ad3ca42b2c04e9 |
|
BLAKE2b-256 | 72e8a76c3cd3c4a1e0b1b79cf9560ce1c039694b8ad97b21cdce278533b1ac5b |
File details
Details for the file polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 3.8 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f993301f9390bd80880622ba09bc6b53c81b495b7bd797e01481b63964da3de7 |
|
MD5 | a72be08fa22ab0854520f8957974f708 |
|
BLAKE2b-256 | 8af4ac701f2dd822a4505e9ec04abb38ff69fa360ab3c8ea49112605532e6fb0 |
File details
Details for the file polars_strsim-0.2.3-cp38-abi3-macosx_11_0_arm64.whl
.
File metadata
- Download URL: polars_strsim-0.2.3-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.1 MB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c12eefd99d0d6f31748a4d2d0fc787dd7e4f2a99c2a4bffd4f76a78baa829dc |
|
MD5 | e07c5cf3ac910b40adbef3c81f853bb8 |
|
BLAKE2b-256 | 22fd9c150b8efd43baa59d1cf602c8bde240bbf863f09f5502633b3caf49d5b3 |
File details
Details for the file polars_strsim-0.2.3-cp38-abi3-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: polars_strsim-0.2.3-cp38-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 3.3 MB
- Tags: CPython 3.8+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 099b41cecdaa6bf70dc70bb1bc657303cfa7d84ad662da7acc06d0bc393ff88f |
|
MD5 | 39b2416f751f1fa79d2594f0576e0053 |
|
BLAKE2b-256 | 3c2610dcc77881417ed1fb349b19785fc8cd041ae1d4c9fea0e493a45563cee6 |