A high performance multiple functional word matcher
Project description
Matcher Rust Implementation with PyO3 Binding
Installation
To install the matcher_py
package, use pip:
pip install matcher_py
Usage
Python Usage
Refer to the test.ipynb file for Python usage examples.
The msgspec
library is used to serialize the matcher configuration. You can also use ormsgpack
or other msgpack serialization libraries, but for performance considerations, we recommend msgspec
. All types are defined in extension_types.py.
Matcher
Here’s an example of how to use the Matcher
:
import msgspec
import numpy as np
from matcher_py import Matcher # type: ignore
from matcher_py.extension_types import MatchTableType, SimpleMatchType, MatchTable
msgpack_encoder = msgspec.msgpack.Encoder()
matcher = Matcher(
msgpack_encoder.encode(
{
"test": [
MatchTable(
table_id=1,
match_table_type=MatchTableType.Simple,
simple_match_type=SimpleMatchType.MatchFanjian | SimpleMatchType.MatchDeleteNormalize,
word_list=["蔔", "你好"],
exemption_simple_match_type=SimpleMatchType.MatchFanjian | SimpleMatchType.MatchDeleteNormalize,
exemption_word_list=[],
)
]
}
)
)
# Perform matching
matcher.is_match(r"卜")
matcher.word_match(r"你,好")
matcher.word_match_as_string("你好")
matcher.batch_word_match_as_string(["你好", "你好", "你真棒"])
# Numpy integration for batch processing
text_array = np.array(
[
"Laborum eiusmod anim aliqua non veniam laboris officia dolor. Adipisicing sit est irure Lorem duis adipisicing exercitation. Cillum excepteur non anim ipsum eiusmod deserunt veniam. Nulla veniam sunt sint ad velit occaecat in deserunt nulla nisi excepteur. Cillum veniam Lorem aute eu. Nisi voluptate laboris quis sint pariatur ullamco minim pariatur officia non anim nisi nulla ipsum ad. Veniam pariatur ut occaecat ut veniam velit aliquip commodo culpa elit eu eiusmod."
]
* 10000,
dtype=np.dtype("object")
)
matcher.numpy_word_match_as_string(text_array)
matcher.numpy_word_match_as_string(text_array, inplace=True)
print(text_array)
Simple Matcher
Here’s an example of how to use the SimpleMatcher
:
import msgspec
import numpy as np
from matcher_py import SimpleMatcher # type: ignore
from matcher_py.extension_types import SimpleMatchType
msgpack_encoder = msgspec.msgpack.Encoder()
simple_matcher = SimpleMatcher(
msgpack_encoder.encode(
{
SimpleMatchType.MatchFanjian | SimpleMatchType.MatchDeleteNormalize: {
1: "无,法,无,天",
2: "xxx",
3: "你好",
6: r"It's /\/\y duty",
4: "xxx,yyy",
},
SimpleMatchType.MatchFanjian: {
4: "xxx,yyy",
},
SimpleMatchType.MatchNone: {
5: "xxxxx,xxxxyyyyxxxxx",
},
}
)
)
# Perform matching
simple_matcher.is_match("xxx")
simple_matcher.simple_process(r"It's /\/\y duty")
simple_matcher.batch_simple_process([r"It's /\/\y duty", "你好", "xxxxxxx"])
# Numpy integration for batch processing
text_array = np.array(
[
"Laborum eiusmod anim aliqua non veniam laboris officia dolor. Adipisicing sit est irure Lorem duis adipisicing exercitation. Cillum excepteur non anim ipsum eiusmod deserunt veniam. Nulla veniam sunt sint ad velit occaecat in deserunt nulla nisi excepteur. Cillum veniam Lorem aute eu. Nisi voluptate laboris quis sint pariatur ullamco minim pariatur officia non anim nisi nulla ipsum ad. Veniam pariatur ut occaecat ut veniam velit aliquip commodo culpa elit eu eiusmod."
]
* 10000,
dtype=np.dtype("object"),
)
simple_matcher.numpy_simple_process(text_array)
simple_matcher.numpy_simple_process(text_array, inplace=True)
print(text_array)
Contributing
Contributions to matcher_py
are welcome! If you find a bug or have a feature request, please open an issue on the GitHub repository. If you would like to contribute code, please fork the repository and submit a pull request.
License
matcher_py
is licensed under the MIT OR Apache-2.0 license.
For more details, visit the GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file matcher_py-0.2.4.tar.gz
.
File metadata
- Download URL: matcher_py-0.2.4.tar.gz
- Upload date:
- Size: 298.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 039e08866c69cfa4b6bf3ec065e6f558905477f29e6f4f643babf6985cce8467 |
|
MD5 | 39f7729fa7dd796b97f813250a2883d3 |
|
BLAKE2b-256 | 3d29d35a0c356f81f64e12d23b293baa6dd0ddec8377b610053cde4d5e5122cf |
File details
Details for the file matcher_py-0.2.4-cp38-abi3-win_amd64.whl
.
File metadata
- Download URL: matcher_py-0.2.4-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11ea8f1d321d9faca20617a2602b88d25c2fd68cc216826ea139a4a78cee57b8 |
|
MD5 | 203a839ac88ddbaa9eb2c54b7aab3588 |
|
BLAKE2b-256 | e9dd2c5637c7a0636221d45d00297e88f702ec61eb8a3c48f365f21485d72667 |
File details
Details for the file matcher_py-0.2.4-cp38-abi3-musllinux_1_2_x86_64.whl
.
File metadata
- Download URL: matcher_py-0.2.4-cp38-abi3-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.8+, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62ab573d8b24fecd6e16414a47e0b99b33d589645422086978acd2007660c589 |
|
MD5 | 6d6ed4b5e4993c5f8ab32a884968502e |
|
BLAKE2b-256 | fcf051cbaf5941ed2daa2cbba46f647d832b6f746abd24742fbfcf6196482e5e |
File details
Details for the file matcher_py-0.2.4-cp38-abi3-musllinux_1_2_aarch64.whl
.
File metadata
- Download URL: matcher_py-0.2.4-cp38-abi3-musllinux_1_2_aarch64.whl
- Upload date:
- Size: 1.6 MB
- Tags: CPython 3.8+, musllinux: musl 1.2+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ddc225a2ef81737a4f0099add853ba13886295004563d7b5df4add20d7f3703f |
|
MD5 | 0b20a6f62d62501cc57c4d6d44481fee |
|
BLAKE2b-256 | 359b9a00529cf74df6b212362893bec74fedb5598f0656fc7b5cccb800aa12ba |
File details
Details for the file matcher_py-0.2.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: matcher_py-0.2.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e2e971cc92d51010ca3cb695eae16698fe5ba90d049b8b699b6eecbaaa3c770 |
|
MD5 | 33e23ecdadd7fe815c32e16f66b4bc7b |
|
BLAKE2b-256 | 901f65e536ca3c5ff0119ffbac66f97f4a0a921062659d46ae81f290faa8a921 |
File details
Details for the file matcher_py-0.2.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: matcher_py-0.2.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 893c4b81e7e261eb84b428e0fc21fd607d2b02bc72c99e4dac120c46d531f47a |
|
MD5 | 656bb073c6fe99e726ea3a13fb7624b9 |
|
BLAKE2b-256 | 255802e964b4d63c217c8126b8b40e3b8e9e12ca9c79eb9b301a7f38298739fc |
File details
Details for the file matcher_py-0.2.4-cp38-abi3-macosx_11_0_arm64.whl
.
File metadata
- Download URL: matcher_py-0.2.4-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb796f5cf853aa7d3bec4dc03e4864e855118234a775faaf0f21861212b8d158 |
|
MD5 | 0ba7c8f1af47238ed3858d531d5fed30 |
|
BLAKE2b-256 | 2d860b362743394c3cb05be2bb226ebc776398d069896c35a7187515d7a7a808 |
File details
Details for the file matcher_py-0.2.4-cp38-abi3-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: matcher_py-0.2.4-cp38-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.8+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fbee33c88d64f063403f67a39942a583a98a8fc37d8a8e44979a745b3ac868bd |
|
MD5 | 329102247d530a3f48c677dee39de5a9 |
|
BLAKE2b-256 | 6acc6dd0593af8dab6e2f57aee76ae3942fb39693f7b599552eac815207bfb90 |