A library to match and compare strings.

Project description

Stringmatch

Yet another small, lightweight string matching library written in Python, based on the Levenshtein distance and the Levenshtein Python C Extension.
Inspired by seatgeek/thefuzz, which did not quite fit my needs, so I am building this library for myself, primarily.

Requirements
Installation
Usage
Links

Requirements

Python 3.9 or later.

Installation

Via pip:

pip install stringmatch

Via git:

pip install -U git+https://github.com/atomflunder/stringmatch

Usage

from stringmatch import Match, Ratio, Strings

# Basic usage:
Match().match("searchlib", "srchlib")               # returns True
Match().match("searchlib", "something else")        # returns False

# Matching lists:
searches = ["searchli", "searhli", "search", "lib", "whatever", "s"]
Match().get_best_match("searchlib", searches)       # returns "searchli"
Match().get_best_matches("searchlib", searches)     # returns ['searchli', 'searhli', 'search']

# Ratios:
Ratio().ratio("searchlib", "searchlib")             # returns 100
Ratio().ratio("searchlib", "srechlib")              # returns 82
ratios = ["searchlib", "srechlib"]
Ratio().ratio_list("searchlib", ratios)             # returns [100, 82]

# Modify strings:
Strings().latinise("HÃ©llÃ¶, world!")                 # returns "Hello, world!"
Strings().remove_punctuation("wh'at;, ever")        # returns "what ever"
Strings().only_letters("HÃ©llÃ¶, world!")             # returns "Hll world"
Strings().ignore_case("test test!", lower=False)    # returns "TEST TEST!"

You can pass in additional arguments for the Match() functions to customise your search further:

`score=int`

The score cutoff for matching, by default set to 70.

Match().match("searchlib", "srechlib", score=85)    # returns False
Match().match("searchlib", "srechlib", score=70)    # returns True

`limit=int`

The limit of how many matches to return. Only available for Matches().get_best_matches(). By default this is set to 5.

searches = ["limit 5", "limit 4", "limit 3", "limit 2", "limit 1", "limit 0"]
Match().get_best_matches("limit 5", searches, limit=2)  # returns ["limit 5", "limit 4"]
Match().get_best_matches("limit 5", searches, limit=1)  # returns ["limit 5"]

`latinise=bool`

Replaces special unicode characters with their latin alphabet equivalents. By default turned off.

Match().match("sÃ©Ã¤rÃ§h", "search", latinise=True)    # returns True
Match().match("sÃ©Ã¤rÃ§h", "search", latinise=False)   # returns False

`ignore_case=bool`

If you want to ignore case sensitivity while searching. By default turned off.

Match().match("test", "TEST", ignore_case=True)     # returns True
Match().match("test", "TEST", ignore_case=False)    # returns False

`remove_punctuation=bool`

Removes commonly used punctuation symbols from the strings, like .,;:!? and so on. Be careful when using this, because if you pass in a string that is only made up of punctuation symbols, you will get an EmptySearchException. By default turned off.

Match().match("test,---....", "test", remove_punctuation=True)  # returns True
Match().match("test,---....", "test", remove_punctuation=False) # returns False

`only_letters=bool`

Removes every character that is not in the latin alphabet, a more extreme version of remove_punctuation. The same rules apply here, be careful when you use it or you might get an EmptySearchException. By default turned off.

Match().match("Â»Â»á…³testá…³â–º", "test", only_letters=True)   # returns True
Match().match("Â»Â»á…³testá…³â–º", "test", only_letters=False)  # returns False

`scorer=str`

The scoring algorithm to use, the available options are: "levenshtein", "jaro", "jaro_winkler". Different algorithms will produce different results, obviously. By default set to "levenshtein".

Match().match("test", "th test", scorer="levenshtein")  # returns True (score = 73)
Match().match("test", "th test", scorer="jaro_winkler") # returns False (score = 60)

Project details

Release history Release notifications | RSS feed

0.14.6

Dec 28, 2024

0.14.5

Nov 15, 2024

0.14.4

Sep 24, 2024

0.14.3

Oct 2, 2023

0.14.2

Jun 6, 2023

0.14.1

Oct 12, 2022

0.14.0

Jul 29, 2022

0.13.0

Jul 25, 2022

0.12.5

Jul 9, 2022

0.12.4

Jul 6, 2022

0.12.3

Jul 5, 2022

0.12.2

Jul 5, 2022

0.12.1

Jul 5, 2022

0.12.0

Jul 5, 2022

0.11.1

Jun 29, 2022

0.11.0

Jun 28, 2022

0.10.13

May 4, 2022

0.10.12

May 4, 2022

0.10.11

May 3, 2022

0.10.10

May 3, 2022

0.10.9

May 3, 2022

0.10.8

May 2, 2022

0.10.7

May 2, 2022

0.10.6

May 2, 2022

0.10.5

May 2, 2022

0.10.4

May 2, 2022

0.10.3

May 2, 2022

0.10.2

May 1, 2022

0.10.1

May 1, 2022

0.10.0

May 1, 2022

0.9.0

May 1, 2022

0.8.1

May 1, 2022

0.8.0

May 1, 2022

0.7.1

Apr 30, 2022

0.7.0

Apr 30, 2022

0.6.6

Apr 30, 2022

0.6.5

Apr 30, 2022

0.6.4

Apr 29, 2022

0.6.3

Apr 28, 2022

0.6.2

Apr 28, 2022

0.6.1

Apr 28, 2022

0.6.0

Apr 27, 2022

0.5.0

Apr 27, 2022

0.4.1

Apr 26, 2022

0.4.0

Apr 26, 2022

0.3.1

Apr 26, 2022

This version

0.3.0

Apr 26, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

stringmatch-0.3.0-py3-none-any.whl (12.5 kB view details)

Uploaded Apr 26, 2022 Python 3

File details

Details for the file stringmatch-0.3.0-py3-none-any.whl.

File metadata

Download URL: stringmatch-0.3.0-py3-none-any.whl
Upload date: Apr 26, 2022
Size: 12.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.9.2

File hashes

Hashes for stringmatch-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c3890bcc87de32fb0aee90af1e7b120b1dc1dc5fdeb8a90568d08c8860792b9c`
MD5	`40976d2c1f629b1dc9db424f53be5fd9`
BLAKE2b-256	`08ed71f35e5fc8fbf6fa7f0a3781121264d2d2014e27fbb115a34b463988cf1a`

See more details on using hashes here.

stringmatch 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Stringmatch

Table of Contents

Requirements

Installation

Usage

`score=int`

`limit=int`

`latinise=bool`

`ignore_case=bool`

`remove_punctuation=bool`

`only_letters=bool`

`scorer=str`

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes