Skip to main content

Polars extension for string similarity

Project description

PyPi Latest Release

String Similarity Measures for Polars

This package provides python bindings to compute various string similarity measures directly on a polars dataframe. All string similarity measures are implemented in rust and computed in parallel.

The similarity measures that have been implemented are:

  • Levenshtein
  • Jaro
  • Jaro-Winkler
  • Jaccard
  • Sørensen-Dice

Each similarity measure returns a value normalized between 0.0 and 1.0 (inclusive), where 0.0 indicates the inputs are maximally different and 1.0 means the strings are maximally similar.

Installing the Library

With pip

pip install polars-strsim

From Source

To build and install this library from source, first ensure you have cargo installed. You will also need maturin, which you can install via pip install 'maturin[patchelf]'

polars-strsim can then be installed in your current python environment by running maturin develop --release

Using the Library

Input:

import polars as pl
from polars_strsim import levenshtein, jaro, jaro_winkler, jaccard, sorensen_dice

df = pl.DataFrame(
    {
        "name_a": ["phillips", "phillips", ""        , "", None      , None],
        "name_b": ["phillips", "philips" , "phillips", "", "phillips", None],
    }
).with_columns(
    levenshtein=levenshtein("name_a", "name_b"),
    jaro=jaro("name_a", "name_b"),
    jaro_winkler=jaro_winkler("name_a", "name_b"),
    jaccard=jaccard("name_a", "name_b"),
    sorensen_dice=sorensen_dice("name_a", "name_b"),
)

print(df)

Output:

shape: (6, 7)
┌──────────┬──────────┬─────────────┬──────────┬──────────────┬─────────┬───────────────┐
│ name_a   ┆ name_b   ┆ levenshtein ┆ jaro     ┆ jaro_winkler ┆ jaccard ┆ sorensen_dice │
│ ---      ┆ ---      ┆ ---         ┆ ---      ┆ ---          ┆ ---     ┆ ---           │
│ str      ┆ str      ┆ f64         ┆ f64      ┆ f64          ┆ f64     ┆ f64           │
╞══════════╪══════════╪═════════════╪══════════╪══════════════╪═════════╪═══════════════╡
│ phillips ┆ phillips ┆ 1.0         ┆ 1.0      ┆ 1.0          ┆ 1.0     ┆ 1.0           │
│ phillips ┆ philips  ┆ 0.875       ┆ 0.958333 ┆ 0.975        ┆ 0.875   ┆ 0.933333      │
│          ┆ phillips ┆ 0.0         ┆ 0.0      ┆ 0.0          ┆ 0.0     ┆ 0.0           │
│          ┆          ┆ 1.0         ┆ 1.0      ┆ 1.0          ┆ 1.0     ┆ 1.0           │
│ null     ┆ phillips ┆ null        ┆ null     ┆ null         ┆ null    ┆ null          │
│ null     ┆ null     ┆ null        ┆ null     ┆ null         ┆ null    ┆ null          │
└──────────┴──────────┴─────────────┴──────────┴──────────────┴─────────┴───────────────┘

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_strsim-0.2.2.tar.gz (31.4 kB view details)

Uploaded Source

Built Distributions

polars_strsim-0.2.2-cp38-abi3-win_amd64.whl (3.0 MB view details)

Uploaded CPython 3.8+ Windows x86-64

polars_strsim-0.2.2-cp38-abi3-win32.whl (2.7 MB view details)

Uploaded CPython 3.8+ Windows x86

polars_strsim-0.2.2-cp38-abi3-musllinux_1_2_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ x86-64

polars_strsim-0.2.2-cp38-abi3-musllinux_1_2_i686.whl (3.9 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ i686

polars_strsim-0.2.2-cp38-abi3-musllinux_1_2_armv7l.whl (3.9 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ ARMv7l

polars_strsim-0.2.2-cp38-abi3-musllinux_1_2_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ ARM64

polars_strsim-0.2.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

polars_strsim-0.2.2-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (4.1 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ i686

polars_strsim-0.2.2-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (3.8 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARMv7l

polars_strsim-0.2.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.6 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARM64

polars_strsim-0.2.2-cp38-abi3-macosx_11_0_arm64.whl (2.9 MB view details)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

polars_strsim-0.2.2-cp38-abi3-macosx_10_12_x86_64.whl (3.1 MB view details)

Uploaded CPython 3.8+ macOS 10.12+ x86-64

File details

Details for the file polars_strsim-0.2.2.tar.gz.

File metadata

  • Download URL: polars_strsim-0.2.2.tar.gz
  • Upload date:
  • Size: 31.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.7.1

File hashes

Hashes for polars_strsim-0.2.2.tar.gz
Algorithm Hash digest
SHA256 c11579c9b2b65634ac46fc92e483f9c921482e41ae89a5b973493a2603661878
MD5 2ebef0ae29a23e56f872b96778fe109c
BLAKE2b-256 630702ae4131c6a1477a86f2e4f541174dded46ae695779b47e969f89cc295d6

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.2-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.2-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 c0ed226e6ef08a3582230536d912cc4a16efbce927fa226c9e9301b00981869c
MD5 42ed8737b7c8a4086ce711c9400682b2
BLAKE2b-256 c4d77f1cd36882da6c50f7f4d713220bf0aaac9037bf24416a24f8050f6be9af

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.2-cp38-abi3-win32.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.2-cp38-abi3-win32.whl
Algorithm Hash digest
SHA256 b4774df52aeb19cb3cba02b47370e34fa3b6940de51270b5293ab31560f5637b
MD5 cba1710803ff99311f270aa247f368e6
BLAKE2b-256 d181492e47f00e922e319a943ae1937c87936da120f213b520e4e244e206e16a

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.2-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.2-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 dbf21af1e77368cf76667942bb6b8fc7e02c137e222f7422aae8623fb6ab1ef9
MD5 4157bc66dc330de4c4e89dbf62e83c79
BLAKE2b-256 0a6b43a2171f9e5fa0e6020bde256696b10a0f90af9b4122ec1188bdfcb1a5d4

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.2-cp38-abi3-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.2-cp38-abi3-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 d245186447dfcc52052f4c10c44271ab24a10880773508a541257cb7bde8aaea
MD5 d5664a5caabc4cb0c38c789dbba81fe4
BLAKE2b-256 c99cbc52685bdf55fccc72eeac652de443320eb459deaca58629baef46f6dc1c

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.2-cp38-abi3-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.2-cp38-abi3-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 4ebe11dc6903de2cbfaa49d79561ad941bb3b457f18f0b0b9896f144139bce5d
MD5 287c2b466a3f1eec1f64bdb79d7ac524
BLAKE2b-256 f251c0e8e1052d7a6df0c3d0022c5460a92437f6377c97787733682d56478340

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.2-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.2-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 9ad1252c5aafeefcee1dadaa5271605f1fc853ecac9473f51d3037481d690da8
MD5 40b87c8ddb49e076e1b62bf024c53db7
BLAKE2b-256 375a0f2d0a6367b6c4a9544f88c9544fdd4499ffdbc74180ae7f96621b850fe6

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 057b93119c71930dca101f72f82cc8c5056fdd89798661e9cf5dbf25c28b8ee6
MD5 fb34db3a1a19110dbe06e94930c7a7e0
BLAKE2b-256 6276094ed94f409108a7878becbc1a55700a903e1782b7f12ed6bc93ccb1b1c8

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.2-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.2-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 4f66c198fa39b407e57e01261ce03cb440eec7bc42801707bb948de341441144
MD5 c1e4d528520b52f3067a115ef8ebcd17
BLAKE2b-256 f6279ab7c01047ff2838e8485b18de77373afab5290fc0477d4f7beb883084ea

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.2-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.2-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 c59993be78262bfb804dff6d2a3561910843abafa7d26d5e4d60ad923c0f2fb8
MD5 142d194376fca9ca9504217a97c69169
BLAKE2b-256 16e6a9bf5d67225c76b5bfb1d745b3e58db41e9b4cbffe09b7a4b85a7d246115

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 49b9f5eba7a629f61442eb622714b252ce2cbf42f7ea98c8a3e82db013b7d0da
MD5 8a5c582f9d36ca283f4d6b2f10b6b25a
BLAKE2b-256 89678757637e2c29742d6d311b75ce494a39437db32faa709446d9d5f907f684

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.2-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.2-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 959ca13f21c03e1fc6d2a5fde7735b896c644e955490ba3ea50f5faf273a23ce
MD5 4444ebd5be01c9129f41ebad415922bd
BLAKE2b-256 c0e79826ba9941d7f15b644bf4afa08e74c5101ab7691298be596192b7bad0db

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.2-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.2-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 fbccbd0035e91a60e47eae70e8fb638c1922c1d29d3a4b22a9a9a11b2712f751
MD5 cfee3bbd34faa9dd14686e7f137801a4
BLAKE2b-256 99444e7419ee5ded83269e42c761b0d80fa843988720c4c36296ad4977fbbbd6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page