Skip to main content

Polars extension for string similarity

Project description

String Similarity Measures for Polars

This package provides python bindings to compute various string similarity measures directly on a polars dataframe. All string similarity measures are implemented in rust and computed in parallel.

The similarity measures that have been implemented are:

  • Levenshtein
  • Jaro
  • Jaro-Winkler
  • Jaccard
  • Sørensen-Dice

Each similarity measure returns a value normalized between 0.0 and 1.0 (inclusive), where 0.0 indicates the inputs are maximally different and 1.0 means the strings are maximally similar.

Installing the Library

With pip

pip install polars-strsim

From Source

To build and install this library from source, first ensure you have cargo installed. You will also need maturin, which you can install via pip install 'maturin[patchelf]'

polars-strsim can then be installed in your current python environment by running maturin develop --release

Using the Library

Input:

import polars as pl
from polars_strsim import levenshtein, jaro, jaro_winkler, jaccard, sorensen_dice

df = pl.DataFrame(
    {
        "name_a": ["phillips", "phillips", ""        , "", None      , None],
        "name_b": ["phillips", "philips" , "phillips", "", "phillips", None],
    }
).with_columns(
    levenshtein=levenshtein("name_a", "name_b"),
    jaro=jaro("name_a", "name_b"),
    jaro_winkler=jaro_winkler("name_a", "name_b"),
    jaccard=jaccard("name_a", "name_b"),
    sorensen_dice=sorensen_dice("name_a", "name_b"),
)

print(df)

Output:

shape: (6, 7)
┌──────────┬──────────┬─────────────┬──────────┬──────────────┬─────────┬───────────────┐
│ name_a   ┆ name_b   ┆ levenshtein ┆ jaro     ┆ jaro_winkler ┆ jaccard ┆ sorensen_dice │
│ ---      ┆ ---      ┆ ---         ┆ ---      ┆ ---          ┆ ---     ┆ ---           │
│ str      ┆ str      ┆ f64         ┆ f64      ┆ f64          ┆ f64     ┆ f64           │
╞══════════╪══════════╪═════════════╪══════════╪══════════════╪═════════╪═══════════════╡
│ phillips ┆ phillips ┆ 1.0         ┆ 1.0      ┆ 1.0          ┆ 1.0     ┆ 1.0           │
│ phillips ┆ philips  ┆ 0.875       ┆ 0.958333 ┆ 0.975        ┆ 0.875   ┆ 0.933333      │
│          ┆ phillips ┆ 0.0         ┆ 0.0      ┆ 0.0          ┆ 0.0     ┆ 0.0           │
│          ┆          ┆ 1.0         ┆ 1.0      ┆ 1.0          ┆ 1.0     ┆ 1.0           │
│ null     ┆ phillips ┆ null        ┆ null     ┆ null         ┆ null    ┆ null          │
│ null     ┆ null     ┆ null        ┆ null     ┆ null         ┆ null    ┆ null          │
└──────────┴──────────┴─────────────┴──────────┴──────────────┴─────────┴───────────────┘

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_strsim-0.2.1.tar.gz (31.1 kB view details)

Uploaded Source

Built Distributions

polars_strsim-0.2.1-cp38-abi3-win_amd64.whl (3.0 MB view details)

Uploaded CPython 3.8+ Windows x86-64

polars_strsim-0.2.1-cp38-abi3-win32.whl (2.7 MB view details)

Uploaded CPython 3.8+ Windows x86

polars_strsim-0.2.1-cp38-abi3-musllinux_1_2_x86_64.whl (3.9 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ x86-64

polars_strsim-0.2.1-cp38-abi3-musllinux_1_2_i686.whl (3.9 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ i686

polars_strsim-0.2.1-cp38-abi3-musllinux_1_2_armv7l.whl (3.9 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ ARMv7l

polars_strsim-0.2.1-cp38-abi3-musllinux_1_2_aarch64.whl (3.7 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ ARM64

polars_strsim-0.2.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

polars_strsim-0.2.1-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (4.1 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ i686

polars_strsim-0.2.1-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (3.8 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARMv7l

polars_strsim-0.2.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.6 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARM64

polars_strsim-0.2.1-cp38-abi3-macosx_11_0_arm64.whl (2.9 MB view details)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

polars_strsim-0.2.1-cp38-abi3-macosx_10_12_x86_64.whl (3.1 MB view details)

Uploaded CPython 3.8+ macOS 10.12+ x86-64

File details

Details for the file polars_strsim-0.2.1.tar.gz.

File metadata

  • Download URL: polars_strsim-0.2.1.tar.gz
  • Upload date:
  • Size: 31.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.7.1

File hashes

Hashes for polars_strsim-0.2.1.tar.gz
Algorithm Hash digest
SHA256 496212536aa7829f93f3b3cc2e627d1b3ad789c5f252f1b337cb00bd09457f9c
MD5 ec03aa206c71f9d32a369ae9feb8d6a5
BLAKE2b-256 5ef421e34373e6bc8c45fdcff5d26866a05b75a6afaad07e77b2aec75eacf28b

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.1-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 0ec5d185cfc24831921de3788f571b968332651cc4ab56d3de7135400149660e
MD5 e3bbb499aa25f6fe75a98fb9907ce233
BLAKE2b-256 c6f94041e576ea278068f5ef55dafb2bb1189fdb7afd5d0f49580571fd6d34b9

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.1-cp38-abi3-win32.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.1-cp38-abi3-win32.whl
Algorithm Hash digest
SHA256 ce7713ff8587e034436449c18819f18a6e330634f4f76d92e006db82c1a2b65e
MD5 4336e1dd4e0ffe1fd2724963417b1137
BLAKE2b-256 9e0d59f3a19d8ecc4d6a4b55f3797628a2376e982380ac922868801b32e26bd6

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.1-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.1-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 1c05926ea8d5d32e8d030307c4dcfab3ee87718f9f8f74aadbb6d944658739cd
MD5 ca26e7b0ad931dbe972a5fd47ca4991e
BLAKE2b-256 57c4e96c9fa473239a6d1986e1c9d2cc84a1e44d562c38c1f29a70d8e1d7fcce

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.1-cp38-abi3-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.1-cp38-abi3-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 3e5f9767f2572048ca050cae0894434c26817e4c8f4edc53ccfcf9662408a5d2
MD5 0eec433afef7ff397417af6ec60bd9c6
BLAKE2b-256 e6faa668cbdcd3e87b5f97c3b0224295dc96737406ae01f9b21e79c6a8c80c21

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.1-cp38-abi3-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.1-cp38-abi3-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 974ba8300a2628514d15baf85e1990801a624d3192a0930d10e90d206053cd53
MD5 7bd9926c955c92a0f6d4588abe45ebfa
BLAKE2b-256 8df01f54207006eaed2bbf0465007bb14750dd25de28ea950262ba387784ae9b

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.1-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.1-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 e571a8fcc56a0a38f69d36408deb464a0bf6396df63e726582afb9dd3562793b
MD5 fd6d3d59fa15bada344dbc5dcb0e5bc3
BLAKE2b-256 2b41dae6911e306a022a58c634c65b3e736c9b06895cc8ece911c30b9c53c424

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c7921b8d95363347dea3fb2925b99ce9862c81bd63bb93028cd192c08acb1994
MD5 846905c1b0ec3a8b392109f8e78c989a
BLAKE2b-256 5bfe59d6e8b92215d6751b5d7d18e68199423986f8f12f849a8af9c6b0f2e059

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.1-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.1-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 3e352f4db95dd44a137dcfe46f13b64ce204803811b69be9e27c6ff784fd2332
MD5 4eb356bac5b33c3ccb995391b19e06a8
BLAKE2b-256 4e10d52dfcd33c7ad8f92206c605a8427681ffafd170351a163849258bf722e1

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.1-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.1-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 dd499e2f0f2312dc04e211d87d37bbf4c91a91392143668f5519def3b75bab8f
MD5 38f31b1cc331b9df756013c46ea44839
BLAKE2b-256 c8efbf9e7a18577b4351987860bfe3510d9123bf6c1d9b2e19462ab05921bc77

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 01bdfc7db4983058b4c89109f6a8315ac17d79236331924a8889e621f46d06ee
MD5 e4468d74a4d51e272c5c530a912c63a5
BLAKE2b-256 6466beadb060f7e372b2a46365f7b635e666ff97eeff847dc7413d1a1542717b

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 eba93ac34b243be4999ab53597ea564924c37b22678ca9978cd12ecf7a69f475
MD5 3003d9794b300823bdc38e538ab9b260
BLAKE2b-256 ab79101ef6e1d355bd1da872c46181f787cd9f762f1f9ab7c26b6fbca4798e12

See more details on using hashes here.

File details

Details for the file polars_strsim-0.2.1-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_strsim-0.2.1-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 23dacf284bf07da0cdd6e7ab6014eb6647db921b2057bdd83775924bffe9e338
MD5 397d3290cd1605d74014103872fc262b
BLAKE2b-256 53a4210f7ba7b87b12f008400f1402c552a13291d8396c9a38e8322b06708744

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page