Skip to main content

A high performance multiple functional word matcher

Project description

Matcher Rust Implement PyO3 binding

Installation

pip install matcher_py

Usage

  • Python usage is in the test.ipynb file.
  • msgspec is used to serialize the matcher config, you can use ormsgpack or other msgpack serialization library to serialize the matcher config, all the types are defined in extention_types.py. But for performance consideration, I recommend msgspec.

Matcher

import msgspec
import numpy as np

from matcher_py import Matcher, SimpleMatcher # type: ignore
from matcher_py.extension_types import MatchTableType, SimpleMatchType, MatchTable

msgpack_encoder = msgspec.msgpack.Encoder()

matcher = Matcher(
    msgpack_encoder.encode(
        {
            "test": [
                MatchTable(
                    table_id=1,
                    match_table_type=MatchTableType.Simple,
                    simple_match_type=SimpleMatchType.MatchFanjian | SimpleMatchType.MatchDeleteNormalize,
                    word_list=["蔔", "你好"],
                    exemption_simple_match_type=SimpleMatchType.MatchFanjian | SimpleMatchType.MatchDeleteNormalize,
                    exemption_word_list=[],
                )
            ]
        }
    )
)

matcher.is_match(r"卜")

matcher.word_match(r"你,好")

matcher.word_match_as_string("你好")

matcher.batch_word_match_as_string(["你好", "你好", "你真棒"])

text_array = np.array(
    [
        "Laborum eiusmod anim aliqua non veniam laboris officia dolor. Adipisicing sit est irure Lorem duis adipisicing exercitation. Cillum excepteur non anim ipsum eiusmod deserunt veniam. Nulla veniam sunt sint ad velit occaecat in deserunt nulla nisi excepteur. Cillum veniam Lorem aute eu. Nisi voluptate laboris quis sint pariatur ullamco minim pariatur officia non anim nisi nulla ipsum ad. Veniam pariatur ut occaecat ut veniam velit aliquip commodo culpa elit eu eiusmod."
    ]
    * 10000,
    dtype=np.dtype("object")
)
matcher.numpy_word_match_as_string(text_array)

text_array = np.array(
    [
        "Laborum eiusmod anim aliqua non veniam laboris officia dolor. Adipisicing sit est irure Lorem duis adipisicing exercitation. Cillum excepteur non anim ipsum eiusmod deserunt veniam. Nulla veniam sunt sint ad velit occaecat in deserunt nulla nisi excepteur. Cillum veniam Lorem aute eu. Nisi voluptate laboris quis sint pariatur ullamco minim pariatur officia non anim nisi nulla ipsum ad. Veniam pariatur ut occaecat ut veniam velit aliquip commodo culpa elit eu eiusmod."
    ]
    * 10000,
    dtype=np.dtype("object")
)
matcher.numpy_word_match_as_string(text_array, inplace=True)
text_array

Simple Matcher

import msgspec
import numpy as np

from matcher_py import Matcher, SimpleMatcher # type: ignore
from matcher_py.extension_types import MatchTableType, SimpleMatchType, MatchTable

msgpack_encoder = msgspec.msgpack.Encoder()

simple_matcher = SimpleMatcher(
    msgpack_encoder.encode(
        {
            SimpleMatchType.MatchFanjian | SimpleMatchType.MatchDeleteNormalize: {
                1: "无,法,无,天",
                2: "xxx",
                3: "你好",
                6: r"It's /\/\y duty",
                4: "xxx,yyy",
            },
            SimpleMatchType.MatchFanjian: {
                4: "xxx,yyy",
            },
            SimpleMatchType.MatchNone: {
                5: "xxxxx,xxxxyyyyxxxxx",
            },
        }
    )
)

simple_matcher.is_match("xxx")

simple_matcher.simple_process(r"It's /\/\y duty")

simple_matcher.batch_simple_process([r"It's /\/\y duty", "你好", "xxxxxxx"])

text_array = np.array(
    [
        "Laborum eiusmod anim aliqua non veniam laboris officia dolor. Adipisicing sit est irure Lorem duis adipisicing exercitation. Cillum excepteur non anim ipsum eiusmod deserunt veniam. Nulla veniam sunt sint ad velit occaecat in deserunt nulla nisi excepteur. Cillum veniam Lorem aute eu. Nisi voluptate laboris quis sint pariatur ullamco minim pariatur officia non anim nisi nulla ipsum ad. Veniam pariatur ut occaecat ut veniam velit aliquip commodo culpa elit eu eiusmod."
    ]
    * 10000,
    dtype=np.dtype("object"),
)
simple_matcher.numpy_simple_process(text_array)

text_array = np.array(
    [
        "Laborum eiusmod anim aliqua non veniam laboris officia dolor. Adipisicing sit est irure Lorem duis adipisicing exercitation. Cillum excepteur non anim ipsum eiusmod deserunt veniam. Nulla veniam sunt sint ad velit occaecat in deserunt nulla nisi excepteur. Cillum veniam Lorem aute eu. Nisi voluptate laboris quis sint pariatur ullamco minim pariatur officia non anim nisi nulla ipsum ad. Veniam pariatur ut occaecat ut veniam velit aliquip commodo culpa elit eu eiusmod."
    ]
    * 10000,
    dtype=np.dtype("object"),
)
simple_matcher.numpy_simple_process(text_array, inplace=True)
text_array

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matcher_py-0.1.7.tar.gz (432.6 kB view details)

Uploaded Source

Built Distributions

matcher_py-0.1.7-cp38-abi3-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.8+ Windows x86-64

matcher_py-0.1.7-cp38-abi3-musllinux_1_2_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ x86-64

matcher_py-0.1.7-cp38-abi3-musllinux_1_2_aarch64.whl (1.6 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ ARM64

matcher_py-0.1.7-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

matcher_py-0.1.7-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.3 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARM64

matcher_py-0.1.7-cp38-abi3-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

matcher_py-0.1.7-cp38-abi3-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.8+ macOS 10.12+ x86-64

File details

Details for the file matcher_py-0.1.7.tar.gz.

File metadata

  • Download URL: matcher_py-0.1.7.tar.gz
  • Upload date:
  • Size: 432.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for matcher_py-0.1.7.tar.gz
Algorithm Hash digest
SHA256 1e56aa702d653c489bf2ef04a3533419ee58d57155143a612f41255af1542615
MD5 421855656bd0a797b9c6b7e57c805ae9
BLAKE2b-256 bdf5149859cfe437e2de84066ee0cd0a951d4a55adc36d671377513fc06a60be

See more details on using hashes here.

File details

Details for the file matcher_py-0.1.7-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for matcher_py-0.1.7-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 2aa227009dbdaef2709f02f072c43ccf120de40e74115418560d4394e646a1f4
MD5 f05f073da29daecf1c4a13a53bef2c5c
BLAKE2b-256 80d0b6fea9b7b168d7e8787c490e02c6d2316ff576cd003b2dd363c46b003dd6

See more details on using hashes here.

File details

Details for the file matcher_py-0.1.7-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for matcher_py-0.1.7-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 79f3a477ace6b40a6c306f724d852e509accd8cd79e3c53738e532a160d1a01f
MD5 e0e86f2c95e108ae8766ada633f0cfbb
BLAKE2b-256 276f1a5796b195a957c13d62d1d362501b96b07c4819ac02433f5a9270d40121

See more details on using hashes here.

File details

Details for the file matcher_py-0.1.7-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for matcher_py-0.1.7-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 aa705161f3106dbf567d13e86fc0eaa135ca0e98e7034fbab7a46445cfa8205e
MD5 0a748d7976e479a744f84bbaa9226837
BLAKE2b-256 7ab11406e51f50038205a4d3caf6ad1e50163dcf5a271403c6f6ce20f3c10277

See more details on using hashes here.

File details

Details for the file matcher_py-0.1.7-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for matcher_py-0.1.7-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5a70e41adf19839c715dd9c8afc80544f215cfc2d92d73f702064fcb4a6155d4
MD5 d7a2a27d5c4075e4f02d2df540b90df9
BLAKE2b-256 ac2cbee8406aaeedbac59c1ee6276052595e7bd594f0391716718ec56d166cc0

See more details on using hashes here.

File details

Details for the file matcher_py-0.1.7-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for matcher_py-0.1.7-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4d61f6a533a7a7d651d9d78deb93b9c51c81d93e1ba76333cedf943db002051f
MD5 50493871346e4cfbba04e6ec0446068a
BLAKE2b-256 e3bd1a71772e72d2b648ccb12d5fbb7a8b842cc244e4d94a10b3f189435050b8

See more details on using hashes here.

File details

Details for the file matcher_py-0.1.7-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for matcher_py-0.1.7-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fe7371364e5ad6bbe14b96ea0822de58d65210d146e0f5a3a281e14cc0800dce
MD5 2294f7c9687509a839c6ac89bbc3eccc
BLAKE2b-256 5fc22fe33c184bdd4eae88f727b19b3b38dd081573e84875e29c70f786f86d9a

See more details on using hashes here.

File details

Details for the file matcher_py-0.1.7-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for matcher_py-0.1.7-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 003a70a38bb3a16b5bebb78baa01573ec2eda95b6ff28eb2b5b4b7bdc2eefe5b
MD5 7ce952f1b78daba1dd78a41809d31b2e
BLAKE2b-256 7b2afada37cefad43ef54659022d900a9dc32bf12b76dc2b5bd6c33fa1a7339a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page