Skip to main content

A high performance multiple functional word matcher

Project description

Matcher Rust Implementation with PyO3 Binding

Installation

To install the matcher_py package, use pip:

pip install matcher_py

Usage

Python Usage

Refer to the test.ipynb file for Python usage examples.

The msgspec library is used to serialize the matcher configuration. You can also use ormsgpack or other msgpack serialization libraries, but for performance considerations, we recommend msgspec. All types are defined in extension_types.py.

Matcher

Here’s an example of how to use the Matcher:

import msgspec
import numpy as np
from matcher_py import Matcher # type: ignore
from matcher_py.extension_types import MatchTableType, SimpleMatchType, MatchTable

msgpack_encoder = msgspec.msgpack.Encoder()

matcher = Matcher(
    msgpack_encoder.encode(
        {
            "test": [
                MatchTable(
                    table_id=1,
                    match_table_type=MatchTableType.Simple,
                    simple_match_type=SimpleMatchType.MatchFanjian | SimpleMatchType.MatchDeleteNormalize,
                    word_list=["蔔", "你好"],
                    exemption_simple_match_type=SimpleMatchType.MatchFanjian | SimpleMatchType.MatchDeleteNormalize,
                    exemption_word_list=[],
                )
            ]
        }
    )
)

# Perform matching
matcher.is_match(r"卜")
matcher.word_match(r"你,好")
matcher.word_match_as_string("你好")
matcher.batch_word_match_as_string(["你好", "你好", "你真棒"])

# Numpy integration for batch processing
text_array = np.array(
    [
        "Laborum eiusmod anim aliqua non veniam laboris officia dolor. Adipisicing sit est irure Lorem duis adipisicing exercitation. Cillum excepteur non anim ipsum eiusmod deserunt veniam. Nulla veniam sunt sint ad velit occaecat in deserunt nulla nisi excepteur. Cillum veniam Lorem aute eu. Nisi voluptate laboris quis sint pariatur ullamco minim pariatur officia non anim nisi nulla ipsum ad. Veniam pariatur ut occaecat ut veniam velit aliquip commodo culpa elit eu eiusmod."
    ]
    * 10000,
    dtype=np.dtype("object")
)
matcher.numpy_word_match_as_string(text_array)
matcher.numpy_word_match_as_string(text_array, inplace=True)
print(text_array)

Simple Matcher

Here’s an example of how to use the SimpleMatcher:

import msgspec
import numpy as np
from matcher_py import SimpleMatcher # type: ignore
from matcher_py.extension_types import SimpleMatchType

msgpack_encoder = msgspec.msgpack.Encoder()

simple_matcher = SimpleMatcher(
    msgpack_encoder.encode(
        {
            SimpleMatchType.MatchFanjian | SimpleMatchType.MatchDeleteNormalize: {
                1: "无,法,无,天",
                2: "xxx",
                3: "你好",
                6: r"It's /\/\y duty",
                4: "xxx,yyy",
            },
            SimpleMatchType.MatchFanjian: {
                4: "xxx,yyy",
            },
            SimpleMatchType.MatchNone: {
                5: "xxxxx,xxxxyyyyxxxxx",
            },
        }
    )
)

# Perform matching
simple_matcher.is_match("xxx")
simple_matcher.simple_process(r"It's /\/\y duty")
simple_matcher.batch_simple_process([r"It's /\/\y duty", "你好", "xxxxxxx"])

# Numpy integration for batch processing
text_array = np.array(
    [
        "Laborum eiusmod anim aliqua non veniam laboris officia dolor. Adipisicing sit est irure Lorem duis adipisicing exercitation. Cillum excepteur non anim ipsum eiusmod deserunt veniam. Nulla veniam sunt sint ad velit occaecat in deserunt nulla nisi excepteur. Cillum veniam Lorem aute eu. Nisi voluptate laboris quis sint pariatur ullamco minim pariatur officia non anim nisi nulla ipsum ad. Veniam pariatur ut occaecat ut veniam velit aliquip commodo culpa elit eu eiusmod."
    ]
    * 10000,
    dtype=np.dtype("object"),
)
simple_matcher.numpy_simple_process(text_array)
simple_matcher.numpy_simple_process(text_array, inplace=True)
print(text_array)

Contributing

Contributions to matcher_py are welcome! If you find a bug or have a feature request, please open an issue on the GitHub repository. If you would like to contribute code, please fork the repository and submit a pull request.

License

matcher_py is licensed under the MIT OR Apache-2.0 license. See the LICENSE file for more information.

For more details, visit the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matcher_py-0.2.2.tar.gz (432.5 kB view details)

Uploaded Source

Built Distributions

matcher_py-0.2.2-cp38-abi3-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.8+ Windows x86-64

matcher_py-0.2.2-cp38-abi3-musllinux_1_2_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ x86-64

matcher_py-0.2.2-cp38-abi3-musllinux_1_2_aarch64.whl (1.6 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ ARM64

matcher_py-0.2.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

matcher_py-0.2.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARM64

matcher_py-0.2.2-cp38-abi3-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

matcher_py-0.2.2-cp38-abi3-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.8+ macOS 10.12+ x86-64

File details

Details for the file matcher_py-0.2.2.tar.gz.

File metadata

  • Download URL: matcher_py-0.2.2.tar.gz
  • Upload date:
  • Size: 432.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for matcher_py-0.2.2.tar.gz
Algorithm Hash digest
SHA256 efc1083f97b580fee9e2639443f1d84e4365b5d0c6ec196b3a81bbac175d997f
MD5 4c3305c40471ecd20bfaf3d0ec8c26da
BLAKE2b-256 338fbe4e43be2a01c6dd4bfe39e55f037aec96de51b911afa6f4effbbb4a0b46

See more details on using hashes here.

File details

Details for the file matcher_py-0.2.2-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for matcher_py-0.2.2-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 53b604dab61df45224537368b6129847ab3dadb5f4b8d285eaee7c6662f3a5e0
MD5 7ffc461ad9e6ba11db41ab86ac33b09f
BLAKE2b-256 41c3d87547b8f82c692d34ed2605a5a7e9071ed908e41a0ae0bae236334fb9db

See more details on using hashes here.

File details

Details for the file matcher_py-0.2.2-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for matcher_py-0.2.2-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 4926b196746c24d9655f1e1f3e12d4d6bde03e401aa00eed950595ed4c9b0d93
MD5 137a1611577e39e80be5594a1ac63bac
BLAKE2b-256 c21c3b161b2553eab1656899de51649fad8027bfe9a002277d1bc12670bb3627

See more details on using hashes here.

File details

Details for the file matcher_py-0.2.2-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for matcher_py-0.2.2-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 ff65442b5e5c4abb6c39345f3e6975a64226f308e844deb3a1bac55a9ddfd3a2
MD5 20a74c67ecbec49c21a85d8ec9e8573a
BLAKE2b-256 d1b34217396fca4ed8f0aee65d00623d8a9df7eee5a04b1a63b09538fdc9957b

See more details on using hashes here.

File details

Details for the file matcher_py-0.2.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for matcher_py-0.2.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4dac94e8cd39a4c654a0e0f5e74b200020e55be2f30d44fab9bf50e40ca442f7
MD5 2efc4f572622e3664e7bb1d1f7e4964c
BLAKE2b-256 112f38a9c889f26083b309112bb80bb8c2621fe686e85ead36756054704c74b1

See more details on using hashes here.

File details

Details for the file matcher_py-0.2.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for matcher_py-0.2.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 dfe614273fdbe2e381d7905db5c6ce35a774552392ade74ace4d768df52bfcc8
MD5 efb15e107027d0da8cb9f53befc704b4
BLAKE2b-256 7f7c5cbd6107119bf344acd488753b9504fba841024a1348af53afbe2d34eec4

See more details on using hashes here.

File details

Details for the file matcher_py-0.2.2-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for matcher_py-0.2.2-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2cf404d86895f2d337b980451a6a4c7554bef5cb02d2a5a9e5b98b28e8ea5f25
MD5 2f85a3c55efc68161185d673320dec76
BLAKE2b-256 c82581517ad82fd7442dd1fe70542910ba6782d62986742635d0afa003777f29

See more details on using hashes here.

File details

Details for the file matcher_py-0.2.2-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for matcher_py-0.2.2-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 e597945ef6faef3bd908ae54ebc4ae4693fe2f8b88591cd5aa12cc653fe58935
MD5 bc00e3eefa09195b3fb39604c5020bec
BLAKE2b-256 58db83487c82d7c8f7ac6ee71f2377bfcbdaae8db32caab12cf6a6dbdc080502

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page