Skip to main content

A high performance multiple functional word matcher

Project description

Matcher Rust Implementation with PyO3 Binding

Installation

To install the matcher_py package, use pip:

pip install matcher_py

Usage

Python Usage

Refer to the test.ipynb file for Python usage examples.

The msgspec library is used to serialize the matcher configuration. You can also use ormsgpack or other msgpack serialization libraries, but for performance considerations, we recommend msgspec. All types are defined in extension_types.py.

Matcher

Here’s an example of how to use the Matcher:

import msgspec
import numpy as np
from matcher_py import Matcher # type: ignore
from matcher_py.extension_types import MatchTableType, SimpleMatchType, MatchTable

msgpack_encoder = msgspec.msgpack.Encoder()

matcher = Matcher(
    msgpack_encoder.encode(
        {
            "test": [
                MatchTable(
                    table_id=1,
                    match_table_type=MatchTableType.Simple,
                    simple_match_type=SimpleMatchType.MatchFanjian | SimpleMatchType.MatchDeleteNormalize,
                    word_list=["蔔", "你好"],
                    exemption_simple_match_type=SimpleMatchType.MatchFanjian | SimpleMatchType.MatchDeleteNormalize,
                    exemption_word_list=[],
                )
            ]
        }
    )
)

# Perform matching
matcher.is_match(r"卜")
matcher.word_match(r"你,好")
matcher.word_match_as_string("你好")
matcher.batch_word_match_as_string(["你好", "你好", "你真棒"])

# Numpy integration for batch processing
text_array = np.array(
    [
        "Laborum eiusmod anim aliqua non veniam laboris officia dolor. Adipisicing sit est irure Lorem duis adipisicing exercitation. Cillum excepteur non anim ipsum eiusmod deserunt veniam. Nulla veniam sunt sint ad velit occaecat in deserunt nulla nisi excepteur. Cillum veniam Lorem aute eu. Nisi voluptate laboris quis sint pariatur ullamco minim pariatur officia non anim nisi nulla ipsum ad. Veniam pariatur ut occaecat ut veniam velit aliquip commodo culpa elit eu eiusmod."
    ]
    * 10000,
    dtype=np.dtype("object")
)
matcher.numpy_word_match_as_string(text_array)
matcher.numpy_word_match_as_string(text_array, inplace=True)
print(text_array)

Simple Matcher

Here’s an example of how to use the SimpleMatcher:

import msgspec
import numpy as np
from matcher_py import SimpleMatcher # type: ignore
from matcher_py.extension_types import SimpleMatchType

msgpack_encoder = msgspec.msgpack.Encoder()

simple_matcher = SimpleMatcher(
    msgpack_encoder.encode(
        {
            SimpleMatchType.MatchFanjian | SimpleMatchType.MatchDeleteNormalize: {
                1: "无,法,无,天",
                2: "xxx",
                3: "你好",
                6: r"It's /\/\y duty",
                4: "xxx,yyy",
            },
            SimpleMatchType.MatchFanjian: {
                4: "xxx,yyy",
            },
            SimpleMatchType.MatchNone: {
                5: "xxxxx,xxxxyyyyxxxxx",
            },
        }
    )
)

# Perform matching
simple_matcher.is_match("xxx")
simple_matcher.simple_process(r"It's /\/\y duty")
simple_matcher.batch_simple_process([r"It's /\/\y duty", "你好", "xxxxxxx"])

# Numpy integration for batch processing
text_array = np.array(
    [
        "Laborum eiusmod anim aliqua non veniam laboris officia dolor. Adipisicing sit est irure Lorem duis adipisicing exercitation. Cillum excepteur non anim ipsum eiusmod deserunt veniam. Nulla veniam sunt sint ad velit occaecat in deserunt nulla nisi excepteur. Cillum veniam Lorem aute eu. Nisi voluptate laboris quis sint pariatur ullamco minim pariatur officia non anim nisi nulla ipsum ad. Veniam pariatur ut occaecat ut veniam velit aliquip commodo culpa elit eu eiusmod."
    ]
    * 10000,
    dtype=np.dtype("object"),
)
simple_matcher.numpy_simple_process(text_array)
simple_matcher.numpy_simple_process(text_array, inplace=True)
print(text_array)

Contributing

Contributions to matcher_py are welcome! If you find a bug or have a feature request, please open an issue on the GitHub repository. If you would like to contribute code, please fork the repository and submit a pull request.

License

matcher_py is licensed under the MIT OR Apache-2.0 license.

For more details, visit the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matcher_py-0.2.4.tar.gz (298.0 kB view details)

Uploaded Source

Built Distributions

matcher_py-0.2.4-cp38-abi3-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.8+ Windows x86-64

matcher_py-0.2.4-cp38-abi3-musllinux_1_2_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ x86-64

matcher_py-0.2.4-cp38-abi3-musllinux_1_2_aarch64.whl (1.6 MB view details)

Uploaded CPython 3.8+ musllinux: musl 1.2+ ARM64

matcher_py-0.2.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

matcher_py-0.2.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.4 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ ARM64

matcher_py-0.2.4-cp38-abi3-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

matcher_py-0.2.4-cp38-abi3-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.8+ macOS 10.12+ x86-64

File details

Details for the file matcher_py-0.2.4.tar.gz.

File metadata

  • Download URL: matcher_py-0.2.4.tar.gz
  • Upload date:
  • Size: 298.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for matcher_py-0.2.4.tar.gz
Algorithm Hash digest
SHA256 039e08866c69cfa4b6bf3ec065e6f558905477f29e6f4f643babf6985cce8467
MD5 39f7729fa7dd796b97f813250a2883d3
BLAKE2b-256 3d29d35a0c356f81f64e12d23b293baa6dd0ddec8377b610053cde4d5e5122cf

See more details on using hashes here.

File details

Details for the file matcher_py-0.2.4-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for matcher_py-0.2.4-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 11ea8f1d321d9faca20617a2602b88d25c2fd68cc216826ea139a4a78cee57b8
MD5 203a839ac88ddbaa9eb2c54b7aab3588
BLAKE2b-256 e9dd2c5637c7a0636221d45d00297e88f702ec61eb8a3c48f365f21485d72667

See more details on using hashes here.

File details

Details for the file matcher_py-0.2.4-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for matcher_py-0.2.4-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 62ab573d8b24fecd6e16414a47e0b99b33d589645422086978acd2007660c589
MD5 6d6ed4b5e4993c5f8ab32a884968502e
BLAKE2b-256 fcf051cbaf5941ed2daa2cbba46f647d832b6f746abd24742fbfcf6196482e5e

See more details on using hashes here.

File details

Details for the file matcher_py-0.2.4-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for matcher_py-0.2.4-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 ddc225a2ef81737a4f0099add853ba13886295004563d7b5df4add20d7f3703f
MD5 0b20a6f62d62501cc57c4d6d44481fee
BLAKE2b-256 359b9a00529cf74df6b212362893bec74fedb5598f0656fc7b5cccb800aa12ba

See more details on using hashes here.

File details

Details for the file matcher_py-0.2.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for matcher_py-0.2.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8e2e971cc92d51010ca3cb695eae16698fe5ba90d049b8b699b6eecbaaa3c770
MD5 33e23ecdadd7fe815c32e16f66b4bc7b
BLAKE2b-256 901f65e536ca3c5ff0119ffbac66f97f4a0a921062659d46ae81f290faa8a921

See more details on using hashes here.

File details

Details for the file matcher_py-0.2.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for matcher_py-0.2.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 893c4b81e7e261eb84b428e0fc21fd607d2b02bc72c99e4dac120c46d531f47a
MD5 656bb073c6fe99e726ea3a13fb7624b9
BLAKE2b-256 255802e964b4d63c217c8126b8b40e3b8e9e12ca9c79eb9b301a7f38298739fc

See more details on using hashes here.

File details

Details for the file matcher_py-0.2.4-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for matcher_py-0.2.4-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 eb796f5cf853aa7d3bec4dc03e4864e855118234a775faaf0f21861212b8d158
MD5 0ba7c8f1af47238ed3858d531d5fed30
BLAKE2b-256 2d860b362743394c3cb05be2bb226ebc776398d069896c35a7187515d7a7a808

See more details on using hashes here.

File details

Details for the file matcher_py-0.2.4-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for matcher_py-0.2.4-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 fbee33c88d64f063403f67a39942a583a98a8fc37d8a8e44979a745b3ac868bd
MD5 329102247d530a3f48c677dee39de5a9
BLAKE2b-256 6acc6dd0593af8dab6e2f57aee76ae3942fb39693f7b599552eac815207bfb90

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page