A versatile comparison library for Python
Project description
Compairer Module
Overview
The compairer
module is a powerful, flexible Python library for comparing various types of data. It provides a suite of comparison methods for strings, vectors, and custom objects, along with utilities for normalization, statistical analysis, and detailed explanations of comparison results.
Features
- Multiple comparison methods (Levenshtein, Jaccard, Cosine, etc.)
- Support for string and vector compairer
- Customizable comparison methods
- Normalization and statistical utilities
- Detailed explanations for comparison results
- Batch comparison capabilities
- Incremental comparison for streaming data
Installation
pip install compairer
Quick Start
from compairer import compare
ref = "hello"
targets = ["hello", "hola", "bonjour"]
results = compare(ref, targets)
print(f"Most similar word to '{ref}' is {results.top()}. All scores: {results.scores}")
Core Components
Comparison Methods
The module includes several built-in comparison methods:
- String compairer: Levenshtein, Jaccard, Cosine, Fuzzy, Regex
- Vector compairer: Euclidean, Manhattan, Cosine, Jaccard
Example:
from compairer import compare
ref, target = "hello", "hola"
levenshteinScore = compare(ref, target, method="levenshtein")
jaccardScore = compare(ref, target, method="jaccard")
print(f"Levenshtein similarity: {levenshteinScore}")
print(f"Jaccard similarity: {jaccardScore}")
Custom Comparison Methods
You can create custom comparison methods:
from compairer.methods import CustomComparison
def myCompareFunc(ref, target):
return len(set(ref) & set(target)) / len(set(ref) | set(target))
customMethod = CustomComparison(myCompareFunc)
score = customMethod("hello", "hola")
print(f"Custom similarity: {score}")
Batch Compairer
Compare multiple targets against a reference:
from compairer import compare
ref = "hello"
targets = ["hello", "hola", "bonjour", "ciao"]
results = compare(ref, targets)
print(f"Similarities: {results.scores}")
print(f"Most similar: {results.top()}")
print(f"Top 2 similar: {results.top(2)}")
Chained Operations
Perform multiple operations in a chain:
from compairer import compare
ref = "hello"
targets = ["hello", "hola", "bonjour", "ciao", "hi"]
results = (compare(ref, targets)
.normalize()
.filter(threshold=0.5)
.top(3))
print(f"Top 3 similar words (similarity > 0.5): {results}")
Incremental Compairer
For streaming data or updating compairer:
from compairer.models import IncrementalComparison
from compairer.methods import LevenshteinComparison
incComp = IncrementalComparison(LevenshteinComparison(), initialRef="hello")
newData = ["hola", "bonjour", "ciao"]
for data in newData:
score = incComp.update(data)
print(f"Updated score after comparing with '{data}': {score}")
print(f"Final score: {incComp.getCurrentScore()}")
print(f"Comparison history: {incComp.getHistory()}")
Explanation Generation
Get detailed explanations for comparison results:
from compairer import compare
ref, target = "hello", "hola"
result = compare(ref, target, method="levenshtein")
explanation = result.explain()
print(explanation)
Advanced Usage
Using Type Hints
The module supports type hinting for better code clarity:
from compairer import compare
from typing import List
def findBestMatch(reference: str, candidates: List[str]) -> str:
result = compare(reference, candidates)
return result.top()
bestMatch = findBestMatch("hello", ["hola", "bonjour", "ciao"])
print(f"Best match: {bestMatch}")
Context Managers for Batch Compairer
Use context managers for efficient batch compairer:
from compairer import compare
from contextlib import contextmanager
@contextmanager
def batchCompare(ref, method="levenshtein"):
comparer = compare(ref, method=method)
try:
yield comparer
finally:
print("Batch comparison completed")
ref = "hello"
with batchCompare(ref) as comparer:
result1 = comparer("hola")
result2 = comparer("bonjour")
result3 = comparer("ciao")
print(f"Results: {result1}, {result2}, {result3}")
Decorators for Comparison Caching
Implement caching for expensive compairer:
from functools import lru_cache
from compairer import compare
@lru_cache(maxsize=100)
def cachedCompare(ref, target, method="levenshtein"):
return compare(ref, target, method=method)
# First call will compute the result
result1 = cachedCompare("hello", "hola")
# Second call will retrieve from cache
result2 = cachedCompare("hello", "hola")
print(f"Results: {result1}, {result2}")
Contributing
Contributions are welcome! Please check out our Contribution Guidelines for details on how to get started.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file compairer-0.1.1.tar.gz
.
File metadata
- Download URL: compairer-0.1.1.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f562a0e289624d481960b06c962c68d338d7deeb42b20138afc0451bcc84b8ed |
|
MD5 | aee961965406eeab22a0e6a8294d965e |
|
BLAKE2b-256 | c1562bdb18c2f281fe870c22aff4b3e96fe1110f7eb9899db64027d852b50f00 |
File details
Details for the file compairer-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: compairer-0.1.1-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9300ea6d0663069d5c3b64d0aa2cc3d54a9aadd57f8b7d3cdf5ea741795b6320 |
|
MD5 | 15e025bdf4e0cf115a08972ec604b99e |
|
BLAKE2b-256 | 3803b3e97a429ae3dd15455ccef2fcc8281a71ac9dc58d3411707ee570313d87 |