A library to match and compare strings.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python :: 3.9
- Python :: 3.10

Project description

Stringmatch

Yet another small, lightweight string matching library written in Python, based on the Levenshtein distance and the Levenshtein Python C Extension.
Inspired by seatgeek/thefuzz, which did not quite fit my needs, so I am building this library for myself, primarily.

Requirements
Installation
Basic Usage
Advanced Usage
- Keyword Arguments
- Scoring Algorithms
Links

Requirements

Python 3.9 or later.

Installation

Install the latest stable version with pip:

pip install stringmatch

Or install the newest version via git (Might be unstable/unfinished):

pip install -U git+https://github.com/atomflunder/stringmatch

Basic Usage

Matching

The match functions allow you to compare 2 strings and check if they are "similar enough" to each other, or get the best match(es) from a list of strings:

from stringmatch import Match

match = Match()

# Checks if the strings are similar.
match.match("searchlib", "srchlib")           # returns True
match.match("searchlib", "something else")    # returns False

# Returns the best match(es) found in the list.
searches = ["searchli", "searhli", "search", "lib", "whatever", "s"]
match.get_best_match("searchlib", searches)   # returns "searchli"
match.get_best_matches("searchlib", searches) # returns ['searchli', 'searhli', 'search']

Ratios

You can get the "ratio of similarity" between strings like this:

from stringmatch import Ratio

ratio = Ratio()

# Getting the ratio between the two strings.
ratio.ratio("searchlib", "searchlib")   # returns 100
ratio.ratio("searchlib", "srechlib")    # returns 82

# Getting the ratio between the first string and the list of strings at once.
searches = ["searchlib", "srechlib"]
ratio.ratio_list("searchlib", searches) # returns [100, 82]

Matching & Ratios

You can also get both the match and the ratio together in a tuple using these functions:

from stringmatch import Match

match = Match()
searches = ["test", "nope", "tset"]

match.match_with_ratio("searchlib", "srechlib")       # returns (True, 82)
match.get_best_match_with_ratio("test", searches)     # returns ("test", 100)
match.get_best_matches_with_ratio("test", searches)   # returns [("test", 100), ("tset", 75)]

Strings

This is primarily meant for internal usage, but you can also use this library to modify strings:

from stringmatch import Strings

strings = Strings()

strings.latinise("HÃ©llÃ¶, world!")               # returns "Hello, world!"
strings.remove_punctuation("wh'at;, ever")      # returns "what ever"
strings.only_letters("HÃ©llÃ¶, world!")           # returns "Hll world"
strings.ignore_case("test test!", lower=False)  # returns "TEST TEST!"

Advanced Usage

Keyword Arguments

You can pass in additional arguments for the Match() functions to customise your search further:

score=70
The score cutoff for matching, by default set to 70.

match("searchlib", "srechlib", score=85)    # returns False
match("searchlib", "srechlib", score=70)    # returns True

limit=5
The limit of how many matches to return. Only available for Matches().get_best_matches(). If you want to return every match set this to 0. By default this is set to 5.

searches = ["limit 5", "limit 4", "limit 3", "limit 2", "limit 1", "limit 0"]
get_best_matches("limit 5", searches, limit=2)  # returns ["limit 5", "limit 4"]
get_best_matches("limit 5", searches, limit=1)  # returns ["limit 5"]

latinise=False
Replaces special unicode characters with their latin alphabet equivalents. By default turned off.

match("sÃ©Ã¤rÃ§h", "search", latinise=True)    # returns True
match("sÃ©Ã¤rÃ§h", "search", latinise=False)   # returns False

ignore_case=False
If you want to ignore case sensitivity while searching. By default turned off.

match("test", "TEST", ignore_case=True)     # returns True
match("test", "TEST", ignore_case=False)    # returns False

remove_punctuation=False
Removes commonly used punctuation symbols from the strings, like .,;:!? and so on. By default turned off.

match("test,---....", "test", remove_punctuation=True)  # returns True
match("test,---....", "test", remove_punctuation=False) # returns False

only_letters=False
Removes every character that is not in the latin alphabet, a more extreme version of remove_punctuation. By default turned off.

match("Â»Â»á…³testá…³â–º", "test", only_letters=True)   # returns True
match("Â»Â»á…³testá…³â–º", "test", only_letters=False)  # returns False

Scoring Algorithms

You can pass in different scoring algorithms when initialising the Match() and Ratio() classes.
The available options are: "levenshtein", "jaro", "jaro_winkler".
Different algorithms will produce different results, obviously. By default set to "levenshtein".

levenshtein_matcher = Match(scorer="levenshtein")
jaro_winkler_matcher = Match(scorer="jaro_winkler")

levenshtein_matcher.match("test", "th test")  # returns True (score = 73)
jaro_winkler_matcher.match("test", "th test") # returns False (score = 60)

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python :: 3.9
- Python :: 3.10

Release history Release notifications | RSS feed

0.14.3

Oct 2, 2023

0.14.2

Jun 6, 2023

0.14.1

Oct 12, 2022

0.14.0

Jul 29, 2022

0.13.0

Jul 25, 2022

0.12.5

Jul 9, 2022

0.12.4

Jul 6, 2022

0.12.3

Jul 5, 2022

0.12.2

Jul 5, 2022

0.12.1

Jul 5, 2022

0.12.0

Jul 5, 2022

0.11.1

Jun 29, 2022

0.11.0

Jun 28, 2022

0.10.13

May 4, 2022

0.10.12

May 4, 2022

0.10.11

May 3, 2022

0.10.10

May 3, 2022

0.10.9

May 3, 2022

0.10.8

May 2, 2022

0.10.7

May 2, 2022

0.10.6

May 2, 2022

0.10.5

May 2, 2022

0.10.4

May 2, 2022

0.10.3

May 2, 2022

0.10.2

May 1, 2022

0.10.1

May 1, 2022

0.10.0

May 1, 2022

0.9.0

May 1, 2022

0.8.1

May 1, 2022

0.8.0

May 1, 2022

0.7.1

Apr 30, 2022

0.7.0

Apr 30, 2022

0.6.6

Apr 30, 2022

0.6.5

Apr 30, 2022

0.6.4

Apr 29, 2022

0.6.3

Apr 28, 2022

0.6.2

Apr 28, 2022

0.6.1

Apr 28, 2022

0.6.0

Apr 27, 2022

This version

0.5.0

Apr 27, 2022

0.4.1

Apr 26, 2022

0.4.0

Apr 26, 2022

0.3.1

Apr 26, 2022

0.3.0

Apr 26, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

stringmatch-0.5.0-py3-none-any.whl (13.5 kB view hashes)

Uploaded Apr 27, 2022 Python 3

Hashes for stringmatch-0.5.0-py3-none-any.whl

Hashes for stringmatch-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7ebd93ade7266f28abcf7d0da7a44b7d24b1343263af88a4f69d9beaf167dc47`
MD5	`0a594e66991fb20ab7e2b823c0d43f5a`
BLAKE2b-256	`20475b07a348c5fbdbf640be519ff46af1de947db16441c0ed3535b255be5210`