Library for Matching Statistics using Suffix Tree
Project description
matching-statistics
The MatchingStatistics library computes matching statistics for a pattern string against an input string using a suffix tree. It efficiently determines the position and length of the longest exact match for each suffix of the pattern within the input string.
Installation
pip install matching-statistics
Usage
from matching_statistics.MatchingStatistics import MatchingStatistics
# Initialize with input string
input_string = "your_input_string_here"
matching_statistics = MatchingStatistics(input_string)
# Compute matching statistics for a pattern
input_pattern = "your_pattern_here"
ms_table = matching_statistics.get_matching_statistics_table(input_pattern)
Comparison of Implementations
Naive Implementation
The naive approach computes matching statistics by directly comparing the pattern with substrings of the input string. The time complexity of this approach is O(pattern_length * pattern_length * input_string_length). This arises because for each position in the input string, we potentially compare up to the length of the pattern.
Suffix Tree Implementation
The suffix tree implementation leverages advanced data structures to achieve efficient pattern matching statistics computation. Specifically, it operates in linear time relative to the length of the pattern, O(pattern_length). This efficiency stems from the preprocessing step that constructs the suffix tree from the input string. Once constructed, the tree allows rapid lookup and comparison of patterns against suffixes of the input string.
Performance Comparison
The performance difference between the two approaches can be significant for large input strings and patterns. While the naive method's complexity grows quadratically with the pattern length and linearly with the input string length, the suffix tree method ensures that the time complexity remains linear in relation to the pattern length. Therefore, for longer patterns or large input strings, the suffix tree implementation provided by matching-statistics
offers considerable performance advantages.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file matching_statistics-0.3.0.tar.gz
.
File metadata
- Download URL: matching_statistics-0.3.0.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a8745177ca3e3f813cc94f3763892b5cd8e90d34379d71afb5311e194a6af4f8 |
|
MD5 | a66802083136937cefb32f970290df31 |
|
BLAKE2b-256 | 915c79eb07fb791b2d1eb2e398c90cad093d5c84093e224d6e92c99843592b6e |
File details
Details for the file matching_statistics-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: matching_statistics-0.3.0-py3-none-any.whl
- Upload date:
- Size: 5.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 468f4b44b4fa16c8b1b7132b49c39dd3e350bfe24aad094ec63af55793fd5663 |
|
MD5 | 8afa66e85ed00e715ba97c2a7526e464 |
|
BLAKE2b-256 | aa8e7aaef7ab327c0d31faec1b8999a3938a8883e5eae83cec80ab9a537fe8f7 |