A library to match and compare strings.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python :: 3.9
- Python :: 3.10

Project description

stringmatch

stringmatch is a small, lightweight string matching library written in Python, based on the Levenshtein distance and the Levenshtein Python C Extension.
Inspired by libraries like seatgeek/thefuzz, which did not quite fit my needs. And so I am building this library for myself, primarily.

Disclaimer: This library is still in an alpha development phase! Changes may be frequent and breaking changes can occur! It is recommended to update frequently to minimise bugs and maximise features.

🎯 Key Features
📋 Requirements
⚙️ Installation
🔨 Basic Usage
🛠️ Advanced Usage
- Keyword Arguments
- Scoring Algorithms
🌟 Contributing
🔗 Links
⚠️ License

Key Features

This library matches compares and strings to each other based mainly on, among others, the Levenshtein distance.
What makes stringmatch special compared to other libraries with similar functions:

💨 Lightweight, straightforward and easy to use
⚡ Extremely fast, up to 10x faster than comparable libraries
🧰 Allows for highly customisable searches
📚 Lots of utility functions to make your life easier
🌍 Handles special unicode characters, like emojis or characters from other languages, like ジャパニーズ

Requirements

Python 3.9 or later.

Installation

Install the latest stable version with pip:

pip install stringmatch

Or install the newest version via git (Might be unstable or unfinished):

pip install -U git+https://github.com/atomflunder/stringmatch

Basic Usage

Matching

The match functions allow you to compare 2 strings and check if they are "similar enough" to each other, or get the best match(es) from a list of strings:

from stringmatch import Match

match = Match()

# Checks if the strings are similar:
match.match("stringmatch", "strngmach")         # returns True
match.match("stringmatch", "something else")    # returns False

# Returns the best match(es) found in the list:
searches = ["stringmat", "strinma", "strings", "mtch", "whatever", "s"]
match.get_best_match("stringmatch", searches)   # returns "stringmat"
match.get_best_matches("stringmatch", searches) # returns ["stringmat", "strinma"]

Ratios

The "ratio of similarity" describes how similar the strings are to each other. It ranges from 100 being an exact match to 0 being something completely different.
You can get the ratio between strings like this:

from stringmatch import Ratio

ratio = Ratio()

# Getting the ratio between the two strings:
ratio.ratio("stringmatch", "stringmatch")   # returns 100
ratio.ratio("stringmatch", "strngmach")     # returns 90
ratio.ratio("stringmatch", "eh")            # returns 15

# Getting the ratio between the first string and the list of strings at once:
searches = ["stringmatch", "strngmach", "eh"]
ratio.ratio_list("stringmatch", searches)   # returns [100, 90, 15]

Matching & Ratios

You can also get both the match and the ratio together in a tuple using these functions:

from stringmatch import Match

match = Match()

match.match_with_ratio("stringmatch", "strngmach")    # returns (True, 90)

searches = ["test", "nope", "tset"]
match.get_best_match_with_ratio("test", searches)     # returns ("test", 100)
match.get_best_matches_with_ratio("test", searches)   # returns [("test", 100), ("tset", 75)]

Distances

Instead of the ratio, you can also get the Levenshtein distance between strings directly. The bigger the distance, the more different the strings:

from stringmatch import Distance

distance = Distance()

distance.distance("kitten", "sitting")      # returns 3

searches = ["sitting", "kitten"]
distance.distance_list("kitten", searches)  # returns [3, 0]

Strings

This is primarily meant for internal usage, but you can also use this library to modify strings:

from stringmatch import Strings

strings = Strings()

strings.latinise("Héllö, world!")               # returns "Hello, world!"
strings.remove_punctuation("wh'at;, ever")      # returns "what ever"
strings.only_letters("Héllö, world!")           # returns "Hll world"
strings.ignore_case("test test!", lower=False)  # returns "TEST TEST!"

Advanced Usage

Keyword Arguments

You can pass in these optional arguments for the Match() and Ratio() functions to customize your search further:

`score`

Type	Default	Description
Integer	70	The score cutoff for matching. Only available for `Match()` functions.

# Example:

match("stringmatch", "strngmach", score=95)    # returns False
match("stringmatch", "strngmach", score=70)    # returns True

`limit`

Type	Default	Description
Integer	5	The limit of how many matches to return. If you want to return every match set this to 0 or None. Only available for the `get_best_matches()` function.

# Example:

searches = ["limit 5", "limit 4", "limit 3", "limit 2", "limit 1", "limit 0", "something else"]

# returns ["limit 5", "limit 4"]
get_best_matches("limit 5", searches, limit=2)

# returns ["limit 5"]
get_best_matches("limit 5", searches, limit=1)

# returns ["limit 5", "limit 4", "limit 3", "limit 2", "limit 1", "limit 0"]
get_best_matches("limit 5", searches, limit=None)

`latinise`

Type	Default	Description
Boolean	False	Replaces special unicode characters with their latin alphabet equivalents. Examples: `Ǽ` -> `AE`, `ノース` -> `nosu`

# Example:

match("séärçh", "search", latinise=True)    # returns True
match("séärçh", "search", latinise=False)   # returns False

`ignore_case`

Type	Default	Description
Boolean	False	If you want to ignore case sensitivity while searching.

# Example:

match("test", "TEST", ignore_case=True)     # returns True
match("test", "TEST", ignore_case=False)    # returns False

`remove_punctuation`

Type	Default	Description
Boolean	False	Removes commonly used punctuation symbols from the strings, like `.,;:!?` and so on.

# Example:

match("test,---....", "test", remove_punctuation=True)  # returns True
match("test,---....", "test", remove_punctuation=False) # returns False

`only_letters`

Type	Default	Description
Boolean	False	Removes every character that is not in the latin alphabet, a more extreme version of `remove_punctuation`.

# Example:

match("»»ᅳtestᅳ►", "test", only_letters=True)   # returns True
match("»»ᅳtestᅳ►", "test", only_letters=False)  # returns False

Scoring Algorithms

You can pass in different scoring algorithms when initializing the Match() and Ratio() classes.
The available options are: LevenshteinScorer, JaroScorer, JaroWinklerScorer.

Click on the links for detailed information about these, but speaking generally the Jaro Scorer will be the fastest, focussing on the characters the strings have in common.
The Jaro-Winkler Scorer slightly modified the Jaro Scorer to prioritise characters at the start of the string.
The Levenshtein Scorer will, most likely, produce the best results, focussing on the number of edits needed to get from one string to the other.

The default scorer is set to LevenshteinScorer.

from stringmatch import Match, LevenshteinScorer, JaroWinklerScorer

lev_matcher = Match(scorer=LevenshteinScorer)
jw_matcher = Match(scorer=JaroWinklerScorer)

lev_matcher.match_with_ratio("test", "th test") # returns (True, 73)
jw_matcher.match_with_ratio("test", "th test")  # returns (False, 60)

Contributing

Contributions to this library are always appreciated! If you have any sort of feedback, or are interested in contributing, head on over to the Contributing Guidelines.
Additionally, if you like this library, leaving a star and spreading the word would be appreciated a lot!
Thanks in advance for taking the time to do so.

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python :: 3.9
- Python :: 3.10

Release history Release notifications | RSS feed

0.14.3

Oct 2, 2023

0.14.2

Jun 6, 2023

0.14.1

Oct 12, 2022

0.14.0

Jul 29, 2022

0.13.0

Jul 25, 2022

0.12.5

Jul 9, 2022

0.12.4

Jul 6, 2022

0.12.3

Jul 5, 2022

0.12.2

Jul 5, 2022

0.12.1

Jul 5, 2022

0.12.0

Jul 5, 2022

0.11.1

Jun 29, 2022

0.11.0

Jun 28, 2022

0.10.13

May 4, 2022

0.10.12

May 4, 2022

0.10.11

May 3, 2022

0.10.10

May 3, 2022

0.10.9

May 3, 2022

0.10.8

May 2, 2022

0.10.7

May 2, 2022

0.10.6

May 2, 2022

0.10.5

May 2, 2022

0.10.4

May 2, 2022

0.10.3

May 2, 2022

0.10.2

May 1, 2022

0.10.1

May 1, 2022

0.10.0

May 1, 2022

0.9.0

May 1, 2022

This version

0.8.1

May 1, 2022

0.8.0

May 1, 2022

0.7.1

Apr 30, 2022

0.7.0

Apr 30, 2022

0.6.6

Apr 30, 2022

0.6.5

Apr 30, 2022

0.6.4

Apr 29, 2022

0.6.3

Apr 28, 2022

0.6.2

Apr 28, 2022

0.6.1

Apr 28, 2022

0.6.0

Apr 27, 2022

0.5.0

Apr 27, 2022

0.4.1

Apr 26, 2022

0.4.0

Apr 26, 2022

0.3.1

Apr 26, 2022

0.3.0

Apr 26, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

stringmatch-0.8.1-py3-none-any.whl (12.9 kB view hashes)

Uploaded May 1, 2022 Python 3

Hashes for stringmatch-0.8.1-py3-none-any.whl

Hashes for stringmatch-0.8.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`02bcaac87a6d2c29c24b532e82a578726cc64410106a60af06fab0c0fa1936f0`
MD5	`45dc9594ff509d5058f131ea631365b5`
BLAKE2b-256	`796445baa424adece94124d85f6b614238059b5370366c42d4b9c3654fc2ea82`

stringmatch 0.8.1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

stringmatch

Table of Contents

Key Features

Requirements

Installation

Basic Usage

Matching

Ratios

Matching & Ratios

Distances

Strings

Advanced Usage

Keyword Arguments

score

limit

latinise

ignore_case

remove_punctuation

only_letters

Scoring Algorithms

Contributing

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

`score`

`limit`

`latinise`

`ignore_case`

`remove_punctuation`

`only_letters`