Skip to main content

rust-participle

Project description

nazrin

Nazrin

中文分词工具 jieba-rs Binding of Python

pypi-publish PyPI

相比纯 Python 实现的 jieba,速度更快,在分词过程中释放了 GIL,可适用于多线程处理

安装

pip install nazrin

用法

from nazrin import Nazrin

nazrin = Nazrin()
print(nazrin.cut('能找到想找的东西程度的能力'))
# ['能', '找到', '想', '找', '的', '东西', '程度', '的', '能力']

print(nazrin.tag('能找到想找的东西程度的能力'))
# [('能', 'v'), ('找到', 'v'), ('想', 'v'), ('找', 'v'), ('的', 'uj'), ('东西', 'ns'), ('程度', 'n'), ('的', 'uj'), ('能力', 'n')]
全部方法介绍
class Nazrin:
    def __init__(self) -> None: ...
    def add_word(
        self, word: str, freq: int | None = None, tag: str | None = None
    ) -> int:
        """
        说明:

            把一个词加进字典。

        参数:

            * ``freq``: 词频,默认为计算值
            * ``tag``: 词性,默认为 None

        """
        ...
    def load_userdict(self, path: str) -> None:
        """
        说明:

            加载用户字典

        参数:

            * ``path``: 字典路径

        """
        ...
    def suggest_freq(self, word: str) -> None:
        """
        说明:

            建议词频,以强制词语中的字符连接或分离。

        参数:

            * ``word``: 词语

        """
        ...
    def cut(self, text: str, hmm: bool = True) -> list[str]:
        """
        说明:

            将包含汉字的整个句子分割成独立的单词,精确模式

        参数:

            * ``text``: 文本
            * ``hmm``: 是否使用隐马尔可夫模型. 默认为 True.

        """
        ...
    def cut_all(self, text: str) -> list[str]:
        """
        说明:

            将包含汉字的整个句子分割成独立的单词,完整模式

        参数:

            * ``text``: 文本

        """
        ...
    def cut_for_search(self, text: str, hmm: bool = True) -> list[str]:
        """
        说明:

            将包含汉字的整个句子分割成独立的单词,搜索引擎模式

        参数:

            * ``text``: 文本
            * ``hmm``: 是否使用隐马尔可夫模型. 默认为 True.

        """
        ...
    def tag(self, text: str, hmm: bool = True) -> list[tuple[str, str]]:
        """
        说明:

            给文本打词性标签

        参数:

            * ``text``: 文本
            * ``hmm``: 是否使用隐马尔可夫模型. 默认为 True.

        """
        ...
    def tokenize(
        self,
        text: str,
        mode: Literal["search", "default"] = "default",
        hmm: bool = True,
    ) -> list[str]:
        """
        说明:

            Tokenize the text

        参数:

            * ``text``: 文本呢
            * ``mode``: 模式. 默认为 "default".
            * ``hmm``: 是否使用隐马尔可夫模型. 默认为 True.

        """

性能对比

In [1]: import jieba

In [2]: jieba.initialize()
Building prefix dict from the default dictionary ...
Loading model from cache jieba.cache
Loading model cost 0.647 seconds.
Prefix dict has been built successfully.

In [3]: from nazrin import Nazrin

In [4]: nazrin = Nazrin()

In [5]: with open("./docs/performance-test.txt", "r", encoding="utf-8") as f:
   ...:     data = f.read()
   ...:

In [6]: %timeit list(jieba.cut(data))
3.77 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [7]: %timeit nazrin.cut(data)
283 µs ± 14.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

鸣谢

naidesu

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nazrin-0.3.0.tar.gz (2.3 MB view hashes)

Uploaded Source

Built Distributions

nazrin-0.3.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

nazrin-0.3.0-pp310-pypy310_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl (5.7 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ s390x

nazrin-0.3.0-pp310-pypy310_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (5.4 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ppc64le

nazrin-0.3.0-pp310-pypy310_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (5.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARMv7l

nazrin-0.3.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.3 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

nazrin-0.3.0-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.whl (5.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.5+ i686

nazrin-0.3.0-cp312-none-win_amd64.whl (5.2 MB view hashes)

Uploaded CPython 3.12 Windows x86-64

nazrin-0.3.0-cp312-none-win32.whl (5.1 MB view hashes)

Uploaded CPython 3.12 Windows x86

nazrin-0.3.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

nazrin-0.3.0-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl (5.7 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ s390x

nazrin-0.3.0-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (5.4 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ppc64le

nazrin-0.3.0-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (5.2 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARMv7l

nazrin-0.3.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.3 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

nazrin-0.3.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl (5.2 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.5+ i686

nazrin-0.3.0-cp312-cp312-macosx_11_0_arm64.whl (5.2 MB view hashes)

Uploaded CPython 3.12 macOS 11.0+ ARM64

nazrin-0.3.0-cp312-cp312-macosx_10_12_x86_64.whl (5.2 MB view hashes)

Uploaded CPython 3.12 macOS 10.12+ x86-64

nazrin-0.3.0-cp311-none-win_amd64.whl (5.2 MB view hashes)

Uploaded CPython 3.11 Windows x86-64

nazrin-0.3.0-cp311-none-win32.whl (5.1 MB view hashes)

Uploaded CPython 3.11 Windows x86

nazrin-0.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

nazrin-0.3.0-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl (5.7 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ s390x

nazrin-0.3.0-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (5.4 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ppc64le

nazrin-0.3.0-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (5.2 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARMv7l

nazrin-0.3.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.3 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

nazrin-0.3.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl (5.2 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.5+ i686

nazrin-0.3.0-cp311-cp311-macosx_11_0_arm64.whl (5.2 MB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

nazrin-0.3.0-cp311-cp311-macosx_10_12_x86_64.whl (5.2 MB view hashes)

Uploaded CPython 3.11 macOS 10.12+ x86-64

nazrin-0.3.0-cp310-none-win_amd64.whl (5.2 MB view hashes)

Uploaded CPython 3.10 Windows x86-64

nazrin-0.3.0-cp310-none-win32.whl (5.1 MB view hashes)

Uploaded CPython 3.10 Windows x86

nazrin-0.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

nazrin-0.3.0-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl (5.7 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ s390x

nazrin-0.3.0-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (5.4 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ppc64le

nazrin-0.3.0-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (5.2 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARMv7l

nazrin-0.3.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.3 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

nazrin-0.3.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl (5.2 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.5+ i686

nazrin-0.3.0-cp310-cp310-macosx_11_0_arm64.whl (5.2 MB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

nazrin-0.3.0-cp310-cp310-macosx_10_12_x86_64.whl (5.2 MB view hashes)

Uploaded CPython 3.10 macOS 10.12+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page