Skip to main content

Polars extension for string similarity

Project description

PyPi Latest Release

String Similarity Measures for Polars

This package provides python bindings to compute various string similarity measures directly on a polars dataframe. All string similarity measures are implemented in rust and computed in parallel.

The similarity measures that have been implemented are:

  • Levenshtein
  • Jaro
  • Jaro-Winkler
  • Jaccard
  • Sørensen-Dice

Each similarity measure returns a value normalized between 0.0 and 1.0 (inclusive), where 0.0 indicates the inputs are maximally different and 1.0 means the strings are maximally similar.

Installing the Library

With pip

pip install polars-strsim

From Source

To build and install this library from source, first ensure you have cargo installed. You will also need maturin, which you can install via pip install 'maturin[patchelf]'

polars-strsim can then be installed in your current python environment by running maturin develop --release

Using the Library

Input:

import polars as pl
from polars_strsim import levenshtein, jaro, jaro_winkler, jaccard, sorensen_dice

df = pl.DataFrame(
    {
        "name_a": ["phillips", "phillips", ""        , "", None      , None],
        "name_b": ["phillips", "philips" , "phillips", "", "phillips", None],
    }
).with_columns(
    levenshtein=levenshtein("name_a", "name_b"),
    jaro=jaro("name_a", "name_b"),
    jaro_winkler=jaro_winkler("name_a", "name_b"),
    jaccard=jaccard("name_a", "name_b"),
    sorensen_dice=sorensen_dice("name_a", "name_b"),
)

with pl.Config(ascii_tables=True):
    print(df)

Output:

shape: (6, 7)
+----------+----------+-------------+----------+--------------+---------+---------------+
| name_a   | name_b   | levenshtein | jaro     | jaro_winkler | jaccard | sorensen_dice |
| ---      | ---      | ---         | ---      | ---          | ---     | ---           |
| str      | str      | f64         | f64      | f64          | f64     | f64           |
+=======================================================================================+
| phillips | phillips | 1.0         | 1.0      | 1.0          | 1.0     | 1.0           |
| phillips | philips  | 0.875       | 0.958333 | 0.975        | 0.875   | 0.933333      |
|          | phillips | 0.0         | 0.0      | 0.0          | 0.0     | 0.0           |
|          |          | 1.0         | 1.0      | 1.0          | 1.0     | 1.0           |
| null     | phillips | null        | null     | null         | null    | null          |
| null     | null     | null        | null     | null         | null    | null          |
+----------+----------+-------------+----------+--------------+---------+---------------+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_strsim-0.2.5.tar.gz (37.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_strsim-0.2.5-cp38-abi3-win_amd64.whl (3.7 MB view details)

Uploaded CPython 3.8+Windows x86-64

polars_strsim-0.2.5-cp38-abi3-win32.whl (3.4 MB view details)

Uploaded CPython 3.8+Windows x86

polars_strsim-0.2.5-cp38-abi3-musllinux_1_2_x86_64.whl (4.7 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

polars_strsim-0.2.5-cp38-abi3-musllinux_1_2_i686.whl (4.9 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ i686

polars_strsim-0.2.5-cp38-abi3-musllinux_1_2_armv7l.whl (4.7 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARMv7l

polars_strsim-0.2.5-cp38-abi3-musllinux_1_2_aarch64.whl (4.4 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

polars_strsim-0.2.5-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

polars_strsim-0.2.5-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (4.9 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ i686

polars_strsim-0.2.5-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (4.5 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARMv7l

polars_strsim-0.2.5-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.3 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

polars_strsim-0.2.5-cp38-abi3-macosx_11_0_arm64.whl (3.7 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_strsim-0.2.5-cp38-abi3-macosx_10_12_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_strsim-0.2.5.tar.gz.

File metadata

  • Download URL: polars_strsim-0.2.5.tar.gz
  • Upload date:
  • Size: 37.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.8.3

File hashes

Hashes for polars_strsim-0.2.5.tar.gz
Algorithm Hash digest
SHA256 cf69b921b8b29ac09ed75dc581705dab84b1f55b88b082d98d1b20b8c31ad654
MD5 988ee250e01abb307a54aa12b06ba860
BLAKE2b-256 5126fd94e9cb037fae73d82ef0f5b79f7a347fffe0d11ecd81ad5bd09dfaa5d0

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.5-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.5-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 2e3f60085caf4f0ed095b96b235fcb5d722716d9efe313a7e287512875d858ef
MD5 c37838ead99ae9eefa273483b9e2a2b6
BLAKE2b-256 95faab5ae2c015ebf073b84d4f6c1a19a893fe23ceaeb122952a514e8353f62c

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.5-cp38-abi3-win32.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.5-cp38-abi3-win32.whl
Algorithm Hash digest
SHA256 d55502975761b3c71694e95ab96b0d672fc44da2adceb980c8bef7b049ebac4f
MD5 d8bb1cc87e705ae2285fb25cc82bf108
BLAKE2b-256 4b1125fbea6c2ea3648bf3ba9e1489e6978d0569a0c2c06f4003191901ca6040

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.5-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.5-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 c925ecae32c512bb985cb20a30a17152a1c041cbf9cf59af4df200565c8f7a15
MD5 a8c4cbbec5670c510471f5cf28d7abd2
BLAKE2b-256 46b0e43f97cb779528e85d339fe657decad356d1ab7f791669081c6200db66e6

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.5-cp38-abi3-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.5-cp38-abi3-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 d5b2b48c8e4738035ec6f1393ccf66fd4bf7374a4dce75117b506f54559a2274
MD5 f4f205e57ecd7e56f1e0f0030617820e
BLAKE2b-256 7a940e71d488eeca8f9d03261f2ff36fb04154ce4527cb5df9971f774b4be9e7

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.5-cp38-abi3-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.5-cp38-abi3-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 2b16e49dc295b259c431bd50d5a9c2126f2fff183611a3481c2d57531d9a4790
MD5 52d33f8bc04d63eac56f1b879e0b35fc
BLAKE2b-256 c1a971f4d835d1b2a81d6299c80e94d4e78cf3573f30c90de0f35e20f5f5e7f7

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.5-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.5-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 7a51dae003e3adc2f078250803cb06b1c06c808382bd010ccc36b5a1384249b5
MD5 40cf80c953270f492e85492f3aa93a80
BLAKE2b-256 fdf08241b024c7f65bb8f7945f3ab722348e2debe9e8011e6824b86760989504

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.5-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.5-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 071ce94fa6d36120cee5a2367967a709f349479213717f4d3523f3f6b8c604ac
MD5 1a1d3f266ecdd3cec7c16b517ffdac01
BLAKE2b-256 ff9ffbe1046973fc58cf6a21e804c8c85e4ec17bcbcb4ee58d9f675c9b228abe

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.5-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.5-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 caa0f6df164713fc6b4c2fc9f07c8ff6308c61f600a29861da5432dbc27e43b7
MD5 beb9bee2fa30e277acbe5433ddaa2691
BLAKE2b-256 3db0f4169023afe5990bd22934a8cf3a92f996622eb031f7446613603c9d5189

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.5-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.5-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 6d10049416949c5aba81048a4b4d8473703e27db1ba21be330a12a5d5d83dedb
MD5 26f3c148e20e5c6cc47cba42ea9fb6c6
BLAKE2b-256 d0e916a03e74059f96177938c3d01c836b2bc8f1c6df2bf54633d6347105ea1b

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.5-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.5-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a95138d59fca90bb650575a24b7885ac1744212150a3eb9cd3a08341d278c5d8
MD5 e3e61f4806a69c9e7dc652a3794649e5
BLAKE2b-256 71401a76f61eb700748894452ee1cd1cb42e770cd9c1282c86a197bcc5ebe7d6

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.5-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.5-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 82c88f979c162bc4cf37394600c2afcb5513cebe430b6ffc776656cd62787c99
MD5 021497060a42e360ebeb34d9a6b0750a
BLAKE2b-256 c771924c041e4347fc976a0e44dbbb34f2933068a9d60343fed00d9712bedb82

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.5-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.5-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ede8797ecb9250b3d16561760e85de56c11b2f46238ee21c018c83ee0bbaa303
MD5 12315d34535763cf31244360bd367e6a
BLAKE2b-256 187489b71028dcc401c3c1971f8a484f173e2a6fc5567e211b78d05b88a1fcc0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page