Python bindings for general-sam and some utilities
Project description
general-sam-py
Python bindings for general-sam
and some utilities.
The suffix automaton of abcbc, image from 后缀自动机 - OI Wiki. |
Usage
GeneralSAM
from general_sam import GeneralSAM
sam = GeneralSAM.construct_from_bytes(b'abcbc')
state = sam.get_root_state()
state.feed_bytes(b'cbc')
assert state.is_accepting()
state = sam.get_root_state()
state.feed_bytes(b'bcb')
assert not state.is_accepting()
from general_sam import GeneralSAM
sam = GeneralSAM.construct_from_chars('abcbc')
state = sam.get_root_state()
state.feed_chars('b')
assert not state.is_accepting()
state.feed_chars('c')
assert state.is_accepting()
state.feed_chars('bc')
assert state.is_accepting()
state.feed_chars('bc')
assert not state.is_accepting() and state.is_nil()
from general_sam import GeneralSAM, GeneralSAMState, construct_trie_from_chars
trie, _ = construct_trie_from_chars(['hello', 'Chielo'])
sam = GeneralSAM.construct_from_trie(trie)
def fetch_state(s: str) -> GeneralSAMState:
state = sam.get_root_state()
state.feed_chars(s)
return state
assert fetch_state('lo').is_accepting()
assert fetch_state('ello').is_accepting()
assert fetch_state('elo').is_accepting()
state = fetch_state('el')
assert not state.is_accepting() and not state.is_nil()
state = fetch_state('bye')
assert not state.is_accepting() and state.is_nil()
VocabPrefixAutomaton
from general_sam import VocabPrefixAutomaton, CountInfo
vocab = ['歌曲', '聆听歌曲', '播放歌曲', '歌词', '查看歌词']
automaton = VocabPrefixAutomaton(vocab, bytes_or_chars='chars')
# NOTE: CountInfo is related to the sorted vocab:
_ = ['播放歌曲', '查看歌词', '歌曲', '歌词', '聆听歌曲']
# 一起 | 聆 | 听 | 歌
state = automaton.get_root_state()
# feed 歌
cnt_info = automaton.prepend_feed(state, '歌')
assert cnt_info is not None and cnt_info == CountInfo(
str_cnt=2, tot_cnt_lower=2, tot_cnt_upper=4
)
selected_idx = automaton.get_order_slice(cnt_info)
assert frozenset(selected_idx) == {0, 3}
selected_vocab = [vocab[i] for i in selected_idx]
assert frozenset(selected_vocab) == {'歌曲', '歌词'}
# feed 听
cnt_info = automaton.prepend_feed(state, '听')
assert cnt_info is None
assert not state.is_nil()
# feed 聆
cnt_info = automaton.prepend_feed(state, '聆')
assert cnt_info is not None and cnt_info == CountInfo(
str_cnt=1, tot_cnt_lower=4, tot_cnt_upper=5
)
selected_idx = automaton.get_order_slice(cnt_info)
assert frozenset(selected_idx) == {1}
selected_vocab = [vocab[i] for i in selected_idx]
assert frozenset(selected_vocab) == {'聆听歌曲'}
# feed 一起
assert not state.is_nil()
cnt_info = automaton.prepend_feed(state, '一起')
assert state.is_nil()
# 来 | 查看 | 歌词
state = automaton.get_root_state()
# feed 歌词
cnt_info = automaton.prepend_feed(state, '歌词')
assert cnt_info is not None and cnt_info == CountInfo(
str_cnt=1, tot_cnt_lower=3, tot_cnt_upper=4
)
selected_idx = automaton.get_order_slice(cnt_info)
assert frozenset(selected_idx) == {3}
selected_vocab = [vocab[i] for i in selected_idx]
assert frozenset(selected_vocab) == {'歌词'}
# feed 查看
cnt_info = automaton.prepend_feed(state, '查看')
assert cnt_info is not None and cnt_info == CountInfo(
str_cnt=1, tot_cnt_lower=1, tot_cnt_upper=2
)
selected_idx = automaton.get_order_slice(cnt_info)
assert frozenset(selected_idx) == {4}
selected_vocab = [vocab[i] for i in selected_idx]
assert frozenset(selected_vocab) == {'查看歌词'}
# feed 来
assert not state.is_nil()
cnt_info = automaton.prepend_feed(state, '来')
assert state.is_nil()
License
- © 2023 Chielo Newctle <ChieloNewctle@gmail.com>
- © 2023 ModelTC Team
This project is licensed under either of
at your option.
The SPDX license identifier for this project is MIT OR Apache-2.0
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
general_sam-0.1.1.tar.gz
(17.1 kB
view hashes)
Built Distributions
general_sam-0.1.1-cp38-abi3-win32.whl
(173.5 kB
view hashes)
Close
Hashes for general_sam-0.1.1-cp38-abi3-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8dd050ca35fd2feeb32b055bbebca4b870f73f81b42b7d1f0b9d19a89ca6345 |
|
MD5 | 16f97134cedaf08f5d0ade9da4799070 |
|
BLAKE2b-256 | f5821900495a5400d12b62157746ebeab1987015a360bf86e00f5ce70bd15450 |
Close
Hashes for general_sam-0.1.1-cp38-abi3-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 884a929fc65ab57b511adbdf1c6ba511f672a47f16dbaccc15a8ba2b125882ba |
|
MD5 | 61b4821c9ca8cfe18fe22ce05f379709 |
|
BLAKE2b-256 | 176dcf832241e3d2f6bca93838759b773c675404f2fccae098452b1ead4266ae |
Close
Hashes for general_sam-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 162c3e7d04d4af8ccea4143140843b3164847676b06e2246cd003cae41f612c8 |
|
MD5 | 7c5382b996450d8a51185d5146056e0a |
|
BLAKE2b-256 | c1261e4a67f4ef7c7ae5004cb66ec59e14282936e2a77f57052c7a65b29d6827 |
Close
Hashes for general_sam-0.1.1-cp38-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a1fe273edb23bb801f3cfedd6f63181a39671d3c6899f9cef8adc3f76b932b0b |
|
MD5 | 4506dc9fe7725c7d392910dbef9131bc |
|
BLAKE2b-256 | 06a1f88a6e6aa21b9e708bad5c82e2f7504d544cce199d381ce9a4b88223a79e |
Close
Hashes for general_sam-0.1.1-cp38-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a92f03e9a82f55bbca218befd285d3adc3aca02d30ad1502fd7c7a2c867fccb |
|
MD5 | 3a2e7b340cd47032af20f237fba49fb7 |
|
BLAKE2b-256 | 535da317049f8b8411e585669d5031ec890ddacb81790adcad289ed0b777341b |
Close
Hashes for general_sam-0.1.1-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f32be24f19403e6c8f26811e26dc215d1a8dab81948a24d192ef7406ad4659cc |
|
MD5 | d7b7b97b84218878e0120670cda8c1f5 |
|
BLAKE2b-256 | cb738a9e3e879c1f0270cd70d09c7c6f3d69697af0db47c48b647070bda8641e |
Close
Hashes for general_sam-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 137cae02c97c44c05ce7394b66bb067e0ec2abf465f39d77c3904b785ee2f545 |
|
MD5 | 84af4ec04eb9cfc28fdc0f76a2462462 |
|
BLAKE2b-256 | ea1222b338985df654ea710ec8dc7d32784b480228d598421110295f770b74a0 |
Close
Hashes for general_sam-0.1.1-cp38-abi3-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0ebcdb2dc5117fef4f1a4f9a53cd367c41bd4700d361c3c22cc0dfe2b22e3fcc |
|
MD5 | 56d106a2d938830116a5d74d5d55afbe |
|
BLAKE2b-256 | 0ab40f392df0820bdbc7a2884c3911a530d25bd554a6fd69bc3447d8f84da21c |
Close
Hashes for general_sam-0.1.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 724d7c79ec28a9673f9b66abbba0860feb5686bec3422081cef3a4814b218dba |
|
MD5 | 2a44b14a1a17ca7aa2397bf873237d72 |
|
BLAKE2b-256 | 85fa2fc476e42fbe7fb5f33c4049fe2fdf68fccb3c062d9604165b7b3b902f08 |
Close
Hashes for general_sam-0.1.1-cp38-abi3-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 595bbc41d4f7b04d0d3f44e3c069d8c423c4e289eeb410bdd83c485b3d249c41 |
|
MD5 | 3113dc82aaf40256c828f1556379add2 |
|
BLAKE2b-256 | 97d6240c5ab5b8f2f09821c06075f980917d30aba47efd41c4cc80404351961d |