Skip to main content

Polars extension for string similarity

Project description

PyPi Latest Release

String Similarity Measures for Polars

This package provides python bindings to compute various string similarity measures directly on a polars dataframe. All string similarity measures are implemented in rust and computed in parallel.

The similarity measures that have been implemented are:

  • Levenshtein
  • Jaro
  • Jaro-Winkler
  • Jaccard
  • Sørensen-Dice

Each similarity measure returns a value normalized between 0.0 and 1.0 (inclusive), where 0.0 indicates the inputs are maximally different and 1.0 means the strings are maximally similar.

Installing the Library

With pip

pip install polars-strsim

From Source

To build and install this library from source, first ensure you have cargo installed. You will also need maturin, which you can install via pip install 'maturin[patchelf]'

polars-strsim can then be installed in your current python environment by running maturin develop --release

Using the Library

Input:

import polars as pl
from polars_strsim import levenshtein, jaro, jaro_winkler, jaccard, sorensen_dice

df = pl.DataFrame(
    {
        "name_a": ["phillips", "phillips", ""        , "", None      , None],
        "name_b": ["phillips", "philips" , "phillips", "", "phillips", None],
    }
).with_columns(
    levenshtein=levenshtein("name_a", "name_b"),
    jaro=jaro("name_a", "name_b"),
    jaro_winkler=jaro_winkler("name_a", "name_b"),
    jaccard=jaccard("name_a", "name_b"),
    sorensen_dice=sorensen_dice("name_a", "name_b"),
)

with pl.Config(ascii_tables=True):
    print(df)

Output:

shape: (6, 7)
+----------+----------+-------------+----------+--------------+---------+---------------+
| name_a   | name_b   | levenshtein | jaro     | jaro_winkler | jaccard | sorensen_dice |
| ---      | ---      | ---         | ---      | ---          | ---     | ---           |
| str      | str      | f64         | f64      | f64          | f64     | f64           |
+=======================================================================================+
| phillips | phillips | 1.0         | 1.0      | 1.0          | 1.0     | 1.0           |
| phillips | philips  | 0.875       | 0.958333 | 0.975        | 0.875   | 0.933333      |
|          | phillips | 0.0         | 0.0      | 0.0          | 0.0     | 0.0           |
|          |          | 1.0         | 1.0      | 1.0          | 1.0     | 1.0           |
| null     | phillips | null        | null     | null         | null    | null          |
| null     | null     | null        | null     | null         | null    | null          |
+----------+----------+-------------+----------+--------------+---------+---------------+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_strsim-0.2.4.tar.gz (37.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_strsim-0.2.4-cp38-abi3-win_amd64.whl (3.7 MB view details)

Uploaded CPython 3.8+Windows x86-64

polars_strsim-0.2.4-cp38-abi3-win32.whl (3.4 MB view details)

Uploaded CPython 3.8+Windows x86

polars_strsim-0.2.4-cp38-abi3-musllinux_1_2_x86_64.whl (4.7 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

polars_strsim-0.2.4-cp38-abi3-musllinux_1_2_i686.whl (4.9 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ i686

polars_strsim-0.2.4-cp38-abi3-musllinux_1_2_armv7l.whl (4.7 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARMv7l

polars_strsim-0.2.4-cp38-abi3-musllinux_1_2_aarch64.whl (4.4 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

polars_strsim-0.2.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.6 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

polars_strsim-0.2.4-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (4.9 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ i686

polars_strsim-0.2.4-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (4.5 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARMv7l

polars_strsim-0.2.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.3 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

polars_strsim-0.2.4-cp38-abi3-macosx_11_0_arm64.whl (3.7 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_strsim-0.2.4-cp38-abi3-macosx_10_12_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_strsim-0.2.4.tar.gz.

File metadata

  • Download URL: polars_strsim-0.2.4.tar.gz
  • Upload date:
  • Size: 37.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.8.2

File hashes

Hashes for polars_strsim-0.2.4.tar.gz
Algorithm Hash digest
SHA256 20871a8743ac3ab1326fa7f5d56e375b022c9cd6c5f8b6c51d5634a5907e7884
MD5 4f8101e3b1baed04b597406f89e32b6e
BLAKE2b-256 609dd0b51d3991fc2ff0f1d2c6d0c6eb03a1ad8afeb5a35632ec2f6066194a5a

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.4-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.4-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 f23f1fdd25c5cb040e226b030dedd6787d7947d8b511a135efa2b2e8a1838081
MD5 d90b4a3e94ef3ef1cf59aa6c5d41315d
BLAKE2b-256 039f748e7063b319087daaf5f8cee405d72c5a37effeeb5751165f79c04dabe2

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.4-cp38-abi3-win32.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.4-cp38-abi3-win32.whl
Algorithm Hash digest
SHA256 1440438e7adc4510ab33147ccbc4a711a7971cde2d4cf6c0af33b6c3246761de
MD5 4954ad5b2cd0b1ad1d2a83fdabba0602
BLAKE2b-256 d6590177c53004f278dfec25ad239091f41bf458aab56420811feb49fa0bb638

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.4-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.4-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 4d7209c76a7b05d7e3f110f5a55b0da72ef22706ec9330c80ab35dd011c33e99
MD5 7e45c24385b462e250ca7a2268551075
BLAKE2b-256 04968f7b959f726755086703eb3a0370e528072a7e204fb8389499f009690a7c

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.4-cp38-abi3-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.4-cp38-abi3-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 addf4062b41ab8b37fafb694f63b5aa1d3224521fba84004858fa66be7d1d606
MD5 eaf362ae41cdb9a8b291e750ba2c1014
BLAKE2b-256 86dd6e89be7026418150d70154bea479fdbcfdca608b2d685d35a4933c1e8a22

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.4-cp38-abi3-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.4-cp38-abi3-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 ad163a646161dcb24f30a2591dd5d678abc2aa2885b01ad756cad30708e17f26
MD5 91f3902d421e6ec93907d082d5dfe959
BLAKE2b-256 f0a26d4647e6219a63bd200f705ca46d63c562f494acd071bf82d1a7d25f7591

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.4-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.4-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 d208b52e12ddfdb96a4341c822947e765b471a90806e000f4ac2a7e6b1cfb661
MD5 3f0ca61306c3aed8fa96d4e3398e6cd2
BLAKE2b-256 780e98fa74fccb35f1e6664999709095bcc6dba3dde50f70aec89835d98bf2e0

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ccdcad56fb19c4a84b862819044a378b98fc12c5c41fb835b5144999f7304c10
MD5 99856705512ba00d5b575bcb0ec27b2b
BLAKE2b-256 0f4e930e5dc1e1083591c2bd440df54c3dacdaff1bcad10dce8475a5f36189a0

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.4-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.4-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 d21bead7ec1238fc4d00acef715f6d35c00d9248566414c1469a5360212026c1
MD5 8748e1962db74a04bc7fc79d629dde95
BLAKE2b-256 481265699f4b9343dcbf75de50fc2b30729db71f2b5d733f25cf9e484a164761

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.4-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.4-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 aea8997fedd254cf4e32ef990a2b7543fc7b4d1282103a067c55db574b96dfb8
MD5 0879d80a9428ff2feeca5a1cb6da9e6e
BLAKE2b-256 db585d9401b75e8e08204a9f4c9d253337a6623ac9cd8c27f2f5359238e9cb97

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 25250aa3840730e705c87461f42675d22be35d49c7783658ae8cbab5c8d5885a
MD5 86d37801150ae86391c5290a2bf37634
BLAKE2b-256 e58e4a1f7a666a743fa9e18f8df202e32c440b6fd25020ece4ceb0e8d0c97194

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.4-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.4-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bdeebaea11a0a59680a3282d4a0fc69723c9833110c669f0f6b01c75d442c5dd
MD5 0275f09efd45b54cac3387f5fb15a20a
BLAKE2b-256 1ec75f16c102e82e8a211af3554ee7f98d9f0b89f3381e7811a666ac9e89d887

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.4-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.4-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 495a0fe1166442b4f6146edc3f18c3dc943ae006d94cd0f7f12540753857c162
MD5 73587b96a2bbdb6a6119d8bf98221761
BLAKE2b-256 7fc24c82abcd3b30d8a2f073d5629f582dc3855dac64492532df3de8a1388858

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page