Skip to main content

Library for Matching Statistics using Suffix Tree

Project description

matching-statistics

The MatchingStatistics library computes matching statistics for a pattern string against an input string using a suffix tree. It efficiently determines the position and length of the longest exact match for each suffix of the pattern within the input string.

Installation

pip install matching-statistics

Usage

from matching_statistics.MatchingStatistics import MatchingStatistics

# Initialize with input string
input_string = "your_input_string_here"
matching_statistics = MatchingStatistics(input_string)

# Compute matching statistics for a pattern
input_pattern = "your_pattern_here"
ms_table = matching_statistics.get_matching_statistics_table(input_pattern)

Comparison of Implementations

Naive Implementation

The naive approach computes matching statistics by directly comparing the pattern with substrings of the input string. The time complexity of this approach is O(pattern_length * pattern_length * input_string_length). This arises because for each position in the input string, we potentially compare up to the length of the pattern.

Suffix Tree Implementation

The suffix tree implementation leverages advanced data structures to achieve efficient pattern matching statistics computation. Specifically, it operates in linear time relative to the length of the pattern, O(pattern_length). This efficiency stems from the preprocessing step that constructs the suffix tree from the input string. Once constructed, the tree allows rapid lookup and comparison of patterns against suffixes of the input string.

Performance Comparison

The performance difference between the two approaches can be significant for large input strings and patterns. While the naive method's complexity grows quadratically with the pattern length and linearly with the input string length, the suffix tree method ensures that the time complexity remains linear in relation to the pattern length. Therefore, for longer patterns or large input strings, the suffix tree implementation provided by matching-statistics offers considerable performance advantages.

Time Complexity Comparison

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matching_statistics-0.3.0.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

matching_statistics-0.3.0-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file matching_statistics-0.3.0.tar.gz.

File metadata

  • Download URL: matching_statistics-0.3.0.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for matching_statistics-0.3.0.tar.gz
Algorithm Hash digest
SHA256 a8745177ca3e3f813cc94f3763892b5cd8e90d34379d71afb5311e194a6af4f8
MD5 a66802083136937cefb32f970290df31
BLAKE2b-256 915c79eb07fb791b2d1eb2e398c90cad093d5c84093e224d6e92c99843592b6e

See more details on using hashes here.

File details

Details for the file matching_statistics-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for matching_statistics-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 468f4b44b4fa16c8b1b7132b49c39dd3e350bfe24aad094ec63af55793fd5663
MD5 8afa66e85ed00e715ba97c2a7526e464
BLAKE2b-256 aa8e7aaef7ab327c0d31faec1b8999a3938a8883e5eae83cec80ab9a537fe8f7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page