Skip to main content

super fast cpp implementation of longest common subsequence

Project description

pylcs

The original repository stop maintenance. This is a transfer version

pylcs is a super fast c++ library which adopts dynamic programming(DP) algorithm to solve two classic LCS problems as below .

The longest common subsequence problem is the problem of finding the longest subsequence common to all sequences in a set of sequences (often just two sequences).

The longest common substring problem is to find the longest string (or strings) that is a substring (or are substrings) of two or more strings.

Levenshtein distance, aka edit distance is also supported. Emm...forget the package name. Example usage is in tests.

We also support Chinese(or any UTF-8) string.

Install

To install, simply do pip install pylcs to pull down the latest version from PyPI.

Python code example

import pylcs

#  finding the longest common subsequence length of string A and string B
A = 'We are shannonai'
B = 'We like shannonai'
pylcs.lcs_sequence_length(A, B)
"""
>>> pylcs.lcs_sequence_length(A, B)
14
"""

#  finding alignment from string A to B
A = 'We are shannonai'
B = 'We like shannonai'
res = pylcs.lcs_sequence_idx(A, B)
''.join([B[i] for i in res if i != -1])
"""
>>> res
[0, 1, 2, -1, -1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
>>> ''.join([B[i] for i in res if i != -1])
'We e shannonai'
"""

#  finding the longest common subsequence length of string A and a list of string B
A = 'We are shannonai'
B = ['We like shannonai', 'We work in shannonai', 'We are not shannonai']
pylcs.lcs_sequence_of_list(A, B)
"""
>>> pylcs.lcs_sequence_of_list(A, B)
[14, 14, 16]
"""

# finding the longest common substring length of string A and string B
A = 'We are shannonai'
B = 'We like shannonai'
pylcs.lcs_string_length(A, B)
"""
>>> pylcs.lcs_string_length(A, B)
11
"""

#  finding alignment from string A to B
A = 'We are shannonai'
B = 'We like shannonai'
res = pylcs.lcs_string_idx(A, B)
''.join([B[i] for i in res if i != -1])
"""
>>> res
[-1, -1, -1, -1, -1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
>>> ''.join([B[i] for i in res if i != -1])
'e shannonai'
"""

#  finding the longest common substring length of string A and a list of string B
A = 'We are shannonai'
B = ['We like shannonai', 'We work in shannonai', 'We are not shannonai']
pylcs.lcs_string_of_list(A, B)
"""
>>> pylcs.lcs_string_of_list(A, B)
[11, 10, 10]
"""

#  finding the weighted edit distance from string A to B
pylcs.edit_distance("aaa", "aba")
pylcs.edit_distance("aaa", "aba", {'a': {'b': 2.0}})
pylcs.edit_distance("", "aa", {'': {'a': 0.5}})
#  weight['']['a'] means inserting a char 'a' costs 0.5
#  similarly, weight['a'][''] means the score of deleting a char 'a'
"""
>>> pylcs.edit_distance("aaa", "aba")
1
>>> pylcs.edit_distance("aaa", "aba", {'a': {'b': 2.0}})
2.0
>>> pylcs.edit_distance("", "aa", {'': {'a': 0.5}})
1.0
"""

#  finding edit distance alignment from string A to B
pylcs.edit_distance_idx("aaa", "aba")
pylcs.edit_distance_idx("aaa", "aba", {'a': {'b': 3}})
pylcs.edit_distance_idx("aa", "aabb", {'a': {'a': 2, 'b': 0}})
"""
>>> pylcs.edit_distance_idx("aaa", "aba")
[0, 1, 2]
>>> pylcs.edit_distance_idx("aaa", "aba", {'a': {'b': 3}})
[0, -1, 2]
>>> pylcs.edit_distance_idx("aa", "aabb", {'a': {'a': 2, 'b': 0}})
[2, 3]
"""

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylcs-0.0.7.tar.gz (5.1 kB view hashes)

Uploaded source

Built Distribution

pylcs-0.0.7-cp37-cp37m-win_amd64.whl (75.9 kB view hashes)

Uploaded cp37

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page