Skip to main content

A Python library for validating and comparing text data using bytearrays.

Project description

Similator

Stars License GPL v3.0 GitHub issues Python Version Tests

Logo

Similator is a powerful Python library designed for efficient text validation and comparison at the byte level. With features like customizable similarity thresholds, case-sensitive or case-insensitive comparisons, and an optional caching mechanism, Similator is ideal for tasks requiring precise text matching and validation.

🚀 Features

  • Byte-Level Text Validation and Comparison: Leverage the power of bytearrays for fast and accurate text operations.
  • Customizable Similarity Search: Set thresholds to find the most relevant matches in your dataset.
  • Automatic Caching: Enable caching to store and reuse search results, boosting performance in repetitive tasks.
  • Advanced Scoring Mechanism: A sophisticated scoring system that rewards larger and more significant matches, making your similarity searches more meaningful.
  • Case Sensitivity Options: Choose between case-sensitive and case-insensitive operations based on your needs.

📦 Installation

Install Similator quickly and easily using pip:

pip install similator

🌟 Quickstart Guide

Here's a quick example to get you up and running with Similator:

1. Import and Initialize

from similator import TextSimilator, ValidData

# Example data
valid_strings = ["Hello", "World", "Text", "Example", "Python"]

# Initialize ValidData
valid_data_instance = ValidData(valid_strings, encoding='utf-8', case_sensitive=False)

# Initialize TextSimilator with ValidData
text_similator = TextSimilator(valid_data_instance, encoding='utf-8', case_sensitive=False)

2. Perform a Search

Search for a string within the valid data with a similarity threshold:

search_value = "hello"
results = text_similator.search(search_value, threshold=0.85)
print(results)
# Output: [Score(value='hello', points=2.0)]

3. Compare Two Strings

Directly compare two strings to obtain a similarity score:

value1 = "hello"
value2 = "hell"
similarity_score = text_similator.compare(value1, value2)
print(similarity_score)
# Output: 1.94

Advanced Usage

Enabling Caching for Repeated Searches

If your application involves repeated searches with similar queries, you can enable caching to improve performance:

# Enable caching with a maximum size of 50 cached results
text_similator_with_cache = TextSimilator(valid_data_instance, auto_cached=True, max_cache_size=50)

# Perform a search and it will be cached
results_cached = text_similator_with_cache.search("python", threshold=0.9)

Exporting and Loading Cached Data

You can export the cache to a file and reload it later for persistent storage:

# Export the current cache to a JSON file
text_similator_with_cache.memory.export_memory("cache.json")

# Load the cache from a JSON file
text_similator_with_cache.memory.load_memory("cache.json")

💬 Contact

If you have any questions, suggestions, or just want to say hello, feel free to contact me:

🛠️ Contributing

Contributions are welcome! If you have any ideas, suggestions, or issues, feel free to open an issue or submit a pull request.

📝 License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

similator-0.1.1.tar.gz (24.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

similator-0.1.1-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file similator-0.1.1.tar.gz.

File metadata

  • Download URL: similator-0.1.1.tar.gz
  • Upload date:
  • Size: 24.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for similator-0.1.1.tar.gz
Algorithm Hash digest
SHA256 45f16a5074b7ffb8dab5862021f45495daae0fff3995f628e13759ec87a335f7
MD5 aadaa6d2dabf851772820eca8948b6db
BLAKE2b-256 f4823b3a657de21d265eff188312853bd22619173b8b0200447985b422b869e0

See more details on using hashes here.

File details

Details for the file similator-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: similator-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 28.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for similator-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2b39f584e23aa1989c9aaec54e7d88d5c8bd5a74750ebe6b1c673115922fca65
MD5 cd782950b9fd5d4e2f9dd3f208f5f379
BLAKE2b-256 be8313c99f5f8cc6cd7d822f7c94e3e1537338effc2ae3c202c2fd3b7d289220

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page