Skip to main content

super fast cpp implementation of longest common subsequence

Project description

pylcs

The original repository stop maintenance. This is a transfer version

pylcs is a super fast c++ library which adopts dynamic programming(DP) algorithm to solve two classic LCS problems as below .

The longest common subsequence problem is the problem of finding the longest subsequence common to all sequences in a set of sequences (often just two sequences).

The longest common substring problem is to find the longest string (or strings) that is a substring (or are substrings) of two or more strings.

Levenshtein distance, aka edit distance is also supported. Emm...forget the package name. Example usage is in tests.

We also support Chinese(or any UTF-8) string.

Colorful Visualization: After 0.1.0, you can visualize the lcs result with colorful output.

Install

To install, simply do pip install pylcs to pull down the latest version from PyPI.

Python code example

import pylcs

#  finding the longest common subsequence length of string A and string B
A = 'We are shannonai'
B = 'We like shannonai'
pylcs.lcs_sequence_length(A, B)
"""
>>> pylcs.lcs_sequence_length(A, B)
14
"""

#  finding alignment from string A to B
A = 'We are shannonai'
B = 'We like shannonai'
res = pylcs.lcs_sequence_idx(A, B)
''.join([B[i] for i in res if i != -1])
"""
>>> res
[0, 1, 2, -1, -1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
>>> ''.join([B[i] for i in res if i != -1])
'We e shannonai'
"""

#  finding the longest common subsequence length of string A and a list of string B
A = 'We are shannonai'
B = ['We like shannonai', 'We work in shannonai', 'We are not shannonai']
pylcs.lcs_sequence_of_list(A, B)
"""
>>> pylcs.lcs_sequence_of_list(A, B)
[14, 14, 16]
"""

# finding the longest common substring length of string A and string B
A = 'We are shannonai'
B = 'We like shannonai'
pylcs.lcs_string_length(A, B)
"""
>>> pylcs.lcs_string_length(A, B)
11
"""

#  finding alignment from string A to B
A = 'We are shannonai'
B = 'We like shannonai'
res = pylcs.lcs_string_idx(A, B)
''.join([B[i] for i in res if i != -1])
"""
>>> res
[-1, -1, -1, -1, -1, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
>>> ''.join([B[i] for i in res if i != -1])
'e shannonai'
"""

#  finding the longest common substring length of string A and a list of string B
A = 'We are shannonai'
B = ['We like shannonai', 'We work in shannonai', 'We are not shannonai']
pylcs.lcs_string_of_list(A, B)
"""
>>> pylcs.lcs_string_of_list(A, B)
[11, 10, 10]
"""

#  finding the weighted edit distance from string A to B
pylcs.edit_distance("aaa", "aba")
pylcs.edit_distance("aaa", "aba", {'a': {'b': 2.0}})
pylcs.edit_distance("", "aa", {'': {'a': 0.5}})
#  weight['']['a'] means inserting a char 'a' costs 0.5
#  similarly, weight['a'][''] means the score of deleting a char 'a'
"""
>>> pylcs.edit_distance("aaa", "aba")
1
>>> pylcs.edit_distance("aaa", "aba", {'a': {'b': 2.0}})
2.0
>>> pylcs.edit_distance("", "aa", {'': {'a': 0.5}})
1.0
"""

#  finding edit distance alignment from string A to B
pylcs.edit_distance_idx("aaa", "aba")
pylcs.edit_distance_idx("aaa", "aba", {'a': {'b': 3}})
pylcs.edit_distance_idx("aa", "aabb", {'a': {'a': 2, 'b': 0}})
"""
>>> pylcs.edit_distance_idx("aaa", "aba")
[0, 1, 2]
>>> pylcs.edit_distance_idx("aaa", "aba", {'a': {'b': 3}})
[0, -1, 2]
>>> pylcs.edit_distance_idx("aa", "aabb", {'a': {'a': 2, 'b': 0}})
[2, 3]
"""

After 0.1.0, you can make a visualized comparison with colorful output. Using coloring_match_sequence to color the s1 and s2 by a match list like:

s1, s2 = "abcdefghijklmnopq", "-c-fgh-kl-nop-q"
match_list = pylcs.lcs_sequence_idx(s1, s2)
colored_s1, colored_s2 = pylcs.coloring_match_sequence(match_list, s1, s2, 11, 11, "#2266ff", "#2266ff", t=1)
print(colored_s1, colored_s2)
colored_s1, colored_s2 = pylcs.coloring_match_sequence(match_list, s1, s2, 11, 11, "#2266ff", "#2266ff", t=2)
print(colored_s1, colored_s2)
colored_s1, colored_s2 = pylcs.coloring_match_sequence(match_list, s1, s2, 11, 11, "#2266ff", "#2266ff", t=3)
print(colored_s1, colored_s2)

s1, s2 = "How does this string edit to s2?", "How similar is this string to s1?"
match_list = pylcs.edit_distance_idx(s1, s2)
colored_s1, colored_s2 = pylcs.coloring_match_sequence(match_list, s1, s2, 4, 4, 230, 230, t=2)
print(colored_s1, colored_s2, sep='\n')

Note that the colorful output uses ANSI escape codes. Referring to https://en.wikipedia.org/wiki/ANSI_escape_code.

The ANSI codes may not work in win32 command line.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylcs-0.1.1.tar.gz (11.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pylcs-0.1.1-cp311-cp311-win_amd64.whl (80.1 kB view details)

Uploaded CPython 3.11Windows x86-64

pylcs-0.1.1-cp310-cp310-win_amd64.whl (79.0 kB view details)

Uploaded CPython 3.10Windows x86-64

pylcs-0.1.1-cp39-cp39-win_amd64.whl (79.0 kB view details)

Uploaded CPython 3.9Windows x86-64

pylcs-0.1.1-cp38-cp38-win_amd64.whl (78.9 kB view details)

Uploaded CPython 3.8Windows x86-64

pylcs-0.1.1-cp37-cp37m-win_amd64.whl (79.2 kB view details)

Uploaded CPython 3.7mWindows x86-64

pylcs-0.1.1-cp36-cp36m-win_amd64.whl (79.4 kB view details)

Uploaded CPython 3.6mWindows x86-64

pylcs-0.1.1-cp35-cp35m-win_amd64.whl (80.1 kB view details)

Uploaded CPython 3.5mWindows x86-64

File details

Details for the file pylcs-0.1.1.tar.gz.

File metadata

  • Download URL: pylcs-0.1.1.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.9

File hashes

Hashes for pylcs-0.1.1.tar.gz
Algorithm Hash digest
SHA256 632c69235d77cda0ba524d82796878801d2f46131fc59e730c98767fc4ce1307
MD5 8beaf2bd1a15267c2d94e5b4094222f2
BLAKE2b-256 7e7f9ca900387de8f3d3658dbf7d0aba96aa2a69f4f5329c83af9be98dc7307d

See more details on using hashes here.

File details

Details for the file pylcs-0.1.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: pylcs-0.1.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 80.1 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.9

File hashes

Hashes for pylcs-0.1.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 9ff06e037c54056cb67d6ef5ad946c0360afeff7d43be67ce09e55201ecc15cc
MD5 4c135dedd97c641396bb316b7b1ac386
BLAKE2b-256 596a4e8c1552eb3c128033d4c3bc19b5bf52758924fe4ae455e0c9e958ea6109

See more details on using hashes here.

File details

Details for the file pylcs-0.1.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: pylcs-0.1.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 79.0 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.9

File hashes

Hashes for pylcs-0.1.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 7b8adea6b41dff27332c967533ec3c42a5e94171be778d6f01f0c5cee82e7604
MD5 41bd2540bd25b36e07249aed567db379
BLAKE2b-256 6941ef10e08997b841c7608e84a2af29c01b77a682a406befe1aec54ebda7cd4

See more details on using hashes here.

File details

Details for the file pylcs-0.1.1-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: pylcs-0.1.1-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 79.0 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.9

File hashes

Hashes for pylcs-0.1.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 0f4c82fad8c0429abef9e98fb98904459c4f5f9fb9b6ce20e0df0841a6a48a54
MD5 9b294dc0cc249aacc649220b9afde6c2
BLAKE2b-256 ef0876a999d81e5c109c24d0a61d4ae36f798ba9f7fb96e60e27748b81b3d900

See more details on using hashes here.

File details

Details for the file pylcs-0.1.1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: pylcs-0.1.1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 78.9 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.9

File hashes

Hashes for pylcs-0.1.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 954495f1c164ccb722b835e7028783f8a38d85ed5f6ff7b9d50143896c6cff9b
MD5 9360561250e55af08e116c4910cab615
BLAKE2b-256 1b5ec53dfa7326f56ead3f245e2d4aef345a4ca90e874208ea98d95f47b7ed3e

See more details on using hashes here.

File details

Details for the file pylcs-0.1.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: pylcs-0.1.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 79.2 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.9

File hashes

Hashes for pylcs-0.1.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 db52d55cfdf813af974bcc164aedbd29274da83086877bf05778aa7fbf777f7f
MD5 537081a4e4d7ced32f6f1603f5c1d07c
BLAKE2b-256 eb816d245dc86d09bba16296994d4427b2b8de61a9ab610d187e5d911854eb84

See more details on using hashes here.

File details

Details for the file pylcs-0.1.1-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: pylcs-0.1.1-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 79.4 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.9

File hashes

Hashes for pylcs-0.1.1-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 b6c43b63e20048f8fec7e122fbc08c238940a0ee5302bf84a70db22c7f8cc836
MD5 0b067e8d346e195687be4782caed38c2
BLAKE2b-256 c5d8a79f12133056f7c34368fc372b4008bb29c6a64c360328e3b8d34d36af08

See more details on using hashes here.

File details

Details for the file pylcs-0.1.1-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: pylcs-0.1.1-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 80.1 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.9

File hashes

Hashes for pylcs-0.1.1-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 d2ebf340aa180d841939d9ec1168dfd072992dda1d48148ceb07b65b1ab62ffa
MD5 6caf88aab3a2c12cf41e0dcc4fca91bb
BLAKE2b-256 119379e3758162cf0af8875fd0d775379b2b65e510b232636e889c487f3fb629

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page