Skip to main content

Polars extension for string similarity

Project description

PyPi Latest Release

String Similarity Measures for Polars

This package provides python bindings to compute various string similarity measures directly on a polars dataframe. All string similarity measures are implemented in rust and computed in parallel.

The similarity measures that have been implemented are:

  • Levenshtein
  • Jaro
  • Jaro-Winkler
  • Jaccard
  • Sørensen-Dice

Each similarity measure returns a value normalized between 0.0 and 1.0 (inclusive), where 0.0 indicates the inputs are maximally different and 1.0 means the strings are maximally similar.

Installing the Library

With pip

pip install polars-strsim

From Source

To build and install this library from source, first ensure you have cargo installed. You will also need maturin, which you can install via pip install 'maturin[patchelf]'

polars-strsim can then be installed in your current python environment by running maturin develop --release

Using the Library

Input:

import polars as pl
from polars_strsim import levenshtein, jaro, jaro_winkler, jaccard, sorensen_dice

df = pl.DataFrame(
    {
        "name_a": ["phillips", "phillips", ""        , "", None      , None],
        "name_b": ["phillips", "philips" , "phillips", "", "phillips", None],
    }
).with_columns(
    levenshtein=levenshtein("name_a", "name_b"),
    jaro=jaro("name_a", "name_b"),
    jaro_winkler=jaro_winkler("name_a", "name_b"),
    jaccard=jaccard("name_a", "name_b"),
    sorensen_dice=sorensen_dice("name_a", "name_b"),
)

with pl.Config(ascii_tables=True):
    print(df)

Output:

shape: (6, 7)
+----------+----------+-------------+----------+--------------+---------+---------------+
| name_a   | name_b   | levenshtein | jaro     | jaro_winkler | jaccard | sorensen_dice |
| ---      | ---      | ---         | ---      | ---          | ---     | ---           |
| str      | str      | f64         | f64      | f64          | f64     | f64           |
+=======================================================================================+
| phillips | phillips | 1.0         | 1.0      | 1.0          | 1.0     | 1.0           |
| phillips | philips  | 0.875       | 0.958333 | 0.975        | 0.875   | 0.933333      |
|          | phillips | 0.0         | 0.0      | 0.0          | 0.0     | 0.0           |
|          |          | 1.0         | 1.0      | 1.0          | 1.0     | 1.0           |
| null     | phillips | null        | null     | null         | null    | null          |
| null     | null     | null        | null     | null         | null    | null          |
+----------+----------+-------------+----------+--------------+---------+---------------+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_strsim-0.2.3.tar.gz (31.4 kB view details)

Uploaded Source

Built Distributions

polars_strsim-0.2.3-cp38-abi3-win_amd64.whl (3.3 MB view details)

Uploaded CPython 3.8+ Windows x86-64

polars_strsim-0.2.3-cp38-abi3-win32.whl (2.9 MB view details)

Uploaded CPython 3.8+ Windows x86

polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ x86-64

polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_i686.whl (4.2 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ i686

polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_armv7l.whl (4.1 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ ARMv7l

polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_aarch64.whl (3.9 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ ARM64

polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.0 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (4.3 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ i686

polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (4.0 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARMv7l

polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.8 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARM64

polars_strsim-0.2.3-cp38-abi3-macosx_11_0_arm64.whl (3.1 MB view details)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

polars_strsim-0.2.3-cp38-abi3-macosx_10_12_x86_64.whl (3.3 MB view details)

Uploaded CPython 3.8+ macOS 10.12+ x86-64

File details

Details for the file polars_strsim-0.2.3.tar.gz.

File metadata

  • Download URL: polars_strsim-0.2.3.tar.gz
  • Upload date:
  • Size: 31.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.7.4

File hashes

Hashes for polars_strsim-0.2.3.tar.gz
Algorithm Hash digest
SHA256 3e92bc81c933867e3e812a7000a51bc830d78377e079e0fc98bb26ad022879e3
MD5 8041f1243e80e57a8e942a6c3897c80b
BLAKE2b-256 cd0a2f7dac45cfebc9372faabd183f52f05a7b5349722666f492056d64bc8e1f

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.3-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.3-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 12364251d8584cf0a42faf03a78c2ae5dc19a23fba74674ce3275d9b3475a193
MD5 651fce92976016f9f781c90e8ad3278a
BLAKE2b-256 b256e8ba294ed528a02bfaeecf83f8b732c3b230545f243d3b1b9044e72ef3f5

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.3-cp38-abi3-win32.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.3-cp38-abi3-win32.whl
Algorithm Hash digest
SHA256 33e9e760558dde60296dba2893bf5001afa9b47bfc5bfddd081fb75a6d571401
MD5 f809380ce5a5fa47b1292685c11357d2
BLAKE2b-256 233d9d68fa5a272989d063f174c81f888a2e30886e55edba7c3209338a3aaa0f

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 22e7c8afed562f7461e5c2e31c35184f024dd730f4477c5f3325d7beb74b5d52
MD5 f35873aa0e02be45dc38394314cf2129
BLAKE2b-256 e3293df8d42ba9269e99db76c309bf2c9da88b4bdc16b7239b594d725823f370

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 f308588690b0a3bf4a4347d6ca7cdd4578f6ea6c290bd83572dcf89b06331853
MD5 afa42deeb81eb37e080ed40fa9c2e463
BLAKE2b-256 2713b71b1f6af4d9dfc283dfafc881c8dcbb42e9706d77a63d91d2ba9bc18b85

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 bfcd90efdcb54016caa119ea8c651df27096f3b2f9a0fda0f2faa2b150861203
MD5 a61617a9b36d01fd05e94b3396b45ad9
BLAKE2b-256 e6bcdd7ae1edf72fbcbe261a39e19e29f1e3360addfe9e21bf18d09a6a9647d0

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.3-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 dbb5afea1d06e8934d579e53deda6e883a59d28558336a339090bda50bf2022a
MD5 7ea49d97ccd30034e180dd4520818286
BLAKE2b-256 a4b48f8f25245d3669b11e2615e5ba1496675d3e798bc61e0b54506ae4fa3db6

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a811fd2b169e2c4f732d087303d1038d84ba309f4debb7852f29f2487a3f8a7b
MD5 85730d37a84eac64bc425a50ad2d7438
BLAKE2b-256 223b7b235b457075aa8ab63a251d33046301678f2af0040a1018c1659c5da941

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 f05c6eaedc91ab657de84abc13796c3bf4e47ff304be10fb137db8db71fa9eb8
MD5 3ef389293c76ad12013a47e84ee9e8a8
BLAKE2b-256 6ebdc8fe9f8f9de7828eb9194d570d2288a5f3a3dfb1dae1b1ee1eae46db7026

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 ce09bd951b6302b1fa904cb5d22870dfcb4092b115501cadb6a0a94ae5c0ff1f
MD5 89f7f96b3d3427c787ad3ca42b2c04e9
BLAKE2b-256 72e8a76c3cd3c4a1e0b1b79cf9560ce1c039694b8ad97b21cdce278533b1ac5b

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f993301f9390bd80880622ba09bc6b53c81b495b7bd797e01481b63964da3de7
MD5 a72be08fa22ab0854520f8957974f708
BLAKE2b-256 8af4ac701f2dd822a4505e9ec04abb38ff69fa360ab3c8ea49112605532e6fb0

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.3-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.3-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1c12eefd99d0d6f31748a4d2d0fc787dd7e4f2a99c2a4bffd4f76a78baa829dc
MD5 e07c5cf3ac910b40adbef3c81f853bb8
BLAKE2b-256 22fd9c150b8efd43baa59d1cf602c8bde240bbf863f09f5502633b3caf49d5b3

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.3-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.3-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 099b41cecdaa6bf70dc70bb1bc657303cfa7d84ad662da7acc06d0bc393ff88f
MD5 39b2416f751f1fa79d2594f0576e0053
BLAKE2b-256 3c2610dcc77881417ed1fb349b19785fc8cd041ae1d4c9fea0e493a45563cee6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page