A high performance multiple functional word matcher
Project description
Matcher Rust Implement PyO3 binding
Installation
pip install matcher_py
Usage
- Python usage is in the test.ipynb file.
msgspec
is used to serialize the matcher config, you can useormsgpack
or other msgpack serialization library to serialize the matcher config, all the types are defined in extention_types.py. But for performance consideration, I recommendmsgspec
.
Matcher
import msgspec
import numpy as np
from matcher_py import Matcher, SimpleMatcher # type: ignore
from matcher_py.extension_types import MatchTableType, SimpleMatchType, MatchTable
msgpack_encoder = msgspec.msgpack.Encoder()
matcher = Matcher(
msgpack_encoder.encode(
{
"test": [
MatchTable(
table_id=1,
match_table_type=MatchTableType.Simple,
simple_match_type=SimpleMatchType.MatchFanjian | SimpleMatchType.MatchDeleteNormalize,
word_list=["蔔", "你好"],
exemption_simple_match_type=SimpleMatchType.MatchFanjian | SimpleMatchType.MatchDeleteNormalize,
exemption_word_list=[],
)
]
}
)
)
matcher.is_match(r"卜")
matcher.word_match(r"你,好")
matcher.word_match_as_string("你好")
matcher.batch_word_match_as_string(["你好", "你好", "你真棒"])
text_array = np.array(
[
"Laborum eiusmod anim aliqua non veniam laboris officia dolor. Adipisicing sit est irure Lorem duis adipisicing exercitation. Cillum excepteur non anim ipsum eiusmod deserunt veniam. Nulla veniam sunt sint ad velit occaecat in deserunt nulla nisi excepteur. Cillum veniam Lorem aute eu. Nisi voluptate laboris quis sint pariatur ullamco minim pariatur officia non anim nisi nulla ipsum ad. Veniam pariatur ut occaecat ut veniam velit aliquip commodo culpa elit eu eiusmod."
]
* 10000,
dtype=np.dtype("object")
)
matcher.numpy_word_match_as_string(text_array)
text_array = np.array(
[
"Laborum eiusmod anim aliqua non veniam laboris officia dolor. Adipisicing sit est irure Lorem duis adipisicing exercitation. Cillum excepteur non anim ipsum eiusmod deserunt veniam. Nulla veniam sunt sint ad velit occaecat in deserunt nulla nisi excepteur. Cillum veniam Lorem aute eu. Nisi voluptate laboris quis sint pariatur ullamco minim pariatur officia non anim nisi nulla ipsum ad. Veniam pariatur ut occaecat ut veniam velit aliquip commodo culpa elit eu eiusmod."
]
* 10000,
dtype=np.dtype("object")
)
matcher.numpy_word_match_as_string(text_array, inplace=True)
text_array
Simple Matcher
import msgspec
import numpy as np
from matcher_py import Matcher, SimpleMatcher # type: ignore
from matcher_py.extension_types import MatchTableType, SimpleMatchType, MatchTable
msgpack_encoder = msgspec.msgpack.Encoder()
simple_matcher = SimpleMatcher(
msgpack_encoder.encode(
{
SimpleMatchType.MatchFanjian | SimpleMatchType.MatchDeleteNormalize: {
1: "无,法,无,天",
2: "xxx",
3: "你好",
6: r"It's /\/\y duty",
4: "xxx,yyy",
},
SimpleMatchType.MatchFanjian: {
4: "xxx,yyy",
},
SimpleMatchType.MatchNone: {
5: "xxxxx,xxxxyyyyxxxxx",
},
}
)
)
simple_matcher.is_match("xxx")
simple_matcher.simple_process(r"It's /\/\y duty")
simple_matcher.batch_simple_process([r"It's /\/\y duty", "你好", "xxxxxxx"])
text_array = np.array(
[
"Laborum eiusmod anim aliqua non veniam laboris officia dolor. Adipisicing sit est irure Lorem duis adipisicing exercitation. Cillum excepteur non anim ipsum eiusmod deserunt veniam. Nulla veniam sunt sint ad velit occaecat in deserunt nulla nisi excepteur. Cillum veniam Lorem aute eu. Nisi voluptate laboris quis sint pariatur ullamco minim pariatur officia non anim nisi nulla ipsum ad. Veniam pariatur ut occaecat ut veniam velit aliquip commodo culpa elit eu eiusmod."
]
* 10000,
dtype=np.dtype("object"),
)
simple_matcher.numpy_simple_process(text_array)
text_array = np.array(
[
"Laborum eiusmod anim aliqua non veniam laboris officia dolor. Adipisicing sit est irure Lorem duis adipisicing exercitation. Cillum excepteur non anim ipsum eiusmod deserunt veniam. Nulla veniam sunt sint ad velit occaecat in deserunt nulla nisi excepteur. Cillum veniam Lorem aute eu. Nisi voluptate laboris quis sint pariatur ullamco minim pariatur officia non anim nisi nulla ipsum ad. Veniam pariatur ut occaecat ut veniam velit aliquip commodo culpa elit eu eiusmod."
]
* 10000,
dtype=np.dtype("object"),
)
simple_matcher.numpy_simple_process(text_array, inplace=True)
text_array
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file matcher_py-0.1.7.tar.gz
.
File metadata
- Download URL: matcher_py-0.1.7.tar.gz
- Upload date:
- Size: 432.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e56aa702d653c489bf2ef04a3533419ee58d57155143a612f41255af1542615 |
|
MD5 | 421855656bd0a797b9c6b7e57c805ae9 |
|
BLAKE2b-256 | bdf5149859cfe437e2de84066ee0cd0a951d4a55adc36d671377513fc06a60be |
File details
Details for the file matcher_py-0.1.7-cp38-abi3-win_amd64.whl
.
File metadata
- Download URL: matcher_py-0.1.7-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2aa227009dbdaef2709f02f072c43ccf120de40e74115418560d4394e646a1f4 |
|
MD5 | f05f073da29daecf1c4a13a53bef2c5c |
|
BLAKE2b-256 | 80d0b6fea9b7b168d7e8787c490e02c6d2316ff576cd003b2dd363c46b003dd6 |
File details
Details for the file matcher_py-0.1.7-cp38-abi3-musllinux_1_2_x86_64.whl
.
File metadata
- Download URL: matcher_py-0.1.7-cp38-abi3-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.8+, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79f3a477ace6b40a6c306f724d852e509accd8cd79e3c53738e532a160d1a01f |
|
MD5 | e0e86f2c95e108ae8766ada633f0cfbb |
|
BLAKE2b-256 | 276f1a5796b195a957c13d62d1d362501b96b07c4819ac02433f5a9270d40121 |
File details
Details for the file matcher_py-0.1.7-cp38-abi3-musllinux_1_2_aarch64.whl
.
File metadata
- Download URL: matcher_py-0.1.7-cp38-abi3-musllinux_1_2_aarch64.whl
- Upload date:
- Size: 1.6 MB
- Tags: CPython 3.8+, musllinux: musl 1.2+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa705161f3106dbf567d13e86fc0eaa135ca0e98e7034fbab7a46445cfa8205e |
|
MD5 | 0a748d7976e479a744f84bbaa9226837 |
|
BLAKE2b-256 | 7ab11406e51f50038205a4d3caf6ad1e50163dcf5a271403c6f6ce20f3c10277 |
File details
Details for the file matcher_py-0.1.7-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: matcher_py-0.1.7-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a70e41adf19839c715dd9c8afc80544f215cfc2d92d73f702064fcb4a6155d4 |
|
MD5 | d7a2a27d5c4075e4f02d2df540b90df9 |
|
BLAKE2b-256 | ac2cbee8406aaeedbac59c1ee6276052595e7bd594f0391716718ec56d166cc0 |
File details
Details for the file matcher_py-0.1.7-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: matcher_py-0.1.7-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d61f6a533a7a7d651d9d78deb93b9c51c81d93e1ba76333cedf943db002051f |
|
MD5 | 50493871346e4cfbba04e6ec0446068a |
|
BLAKE2b-256 | e3bd1a71772e72d2b648ccb12d5fbb7a8b842cc244e4d94a10b3f189435050b8 |
File details
Details for the file matcher_py-0.1.7-cp38-abi3-macosx_11_0_arm64.whl
.
File metadata
- Download URL: matcher_py-0.1.7-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe7371364e5ad6bbe14b96ea0822de58d65210d146e0f5a3a281e14cc0800dce |
|
MD5 | 2294f7c9687509a839c6ac89bbc3eccc |
|
BLAKE2b-256 | 5fc22fe33c184bdd4eae88f727b19b3b38dd081573e84875e29c70f786f86d9a |
File details
Details for the file matcher_py-0.1.7-cp38-abi3-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: matcher_py-0.1.7-cp38-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.8+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 003a70a38bb3a16b5bebb78baa01573ec2eda95b6ff28eb2b5b4b7bdc2eefe5b |
|
MD5 | 7ce952f1b78daba1dd78a41809d31b2e |
|
BLAKE2b-256 | 7b2afada37cefad43ef54659022d900a9dc32bf12b76dc2b5bd6c33fa1a7339a |