This package contains Greedy String Tiling calculation
Project description
GST Calculation
GST is a module that calculates Greedy String Tiling algorithm as described in "String Similarity via Greedy String Tiling and Running Karp−Rabin Matching" (Wise, 1993) - https://www.researchgate.net/publication/262763983_String_Similarity_via_Greedy_String_Tiling_and_Running_Karp-Rabin_Matching
Installation
Using PIP via PyPI
pip install gst-calculation
Using PIP via Github
pip install git+https://github.com/tomytw/gst-calculation.git@0.1.2
Usage
Importing the package
>>> from gst_calculation import gst
You can calculate a gst of a collection of numbers or strings (or any collection of object that can be compared using equal function, must have __eq__ method inside it's class)
The result will be array that contains two elements (two index):
- Tile information (position and length of matched tiles)
- Total score
GST on Numbers List
>>> tokens_sequence_1 = [1,2,3,4,5]
>>> tokens_sequence_2 = [3,4,5,6,7]
>>> gst.calculate(tokens_sequence_1, tokens_sequence_2, minimal_match=3)
[[{'token_1_position': 2, 'token_2_position': 0, 'length': 3, 'score': 3}], 3]
GST on Strings List
>>> tokens_sequence_1 = ['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
>>> tokens_sequence_2 = ['the', 'lazy', 'dog', 'jumps', 'over', 'the', 'quick', 'brown', 'fox']
>>> gst.calculate(tokens_sequence_1, tokens_sequence_2, minimal_match=3)
[[{'token_1_position': 0, 'token_2_position': 5, 'length': 4, 'score': 4},
{'token_1_position': 6, 'token_2_position': 0, 'length': 3, 'score': 3}],
7]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gst_calculation-0.1.2.tar.gz
(4.9 kB
view hashes)
Built Distribution
Close
Hashes for gst_calculation-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63160ea3a14b07922bb333b0b09441d8180542ba98a2d3be7fbe7947cf4defe8 |
|
MD5 | f4f959e92f97d8107b139a60ed93730a |
|
BLAKE2b-256 | 38892003384eebe14e652caf5cbf9916c755b9b130dd09a136b3841eeaca8a88 |