Skip to main content

Learning Using Texts - Chinese Parser

Project description

lute3-mandarin

A Mandarin parser for Lute (lute3) using the jieba library, and pypinyin for readings.

Installation

See the Lute manual.

Usage

When this parser is installed, you can add "Mandarin Chinese" as a language to Lute, which comes with a simple story.

Parsing exceptions

Sometimes jieba groups too many characters together when parsing. For example, it returns "清华大学" as a single word of four characters, which might not be correct.

You can specify how Lute should correct these cases by adding some simple "rules" to the file plugins/lute_mandarin/parser_exceptions.txt found in your Lute data directory. This file is automatically created when Lute starts. Each rule contains the characters of the word as parsed by jieba, with regular commas added where the word should be split.

Some examples:

File content Results when parsing "清华大学"
(empty file) "清华大学"
清华,大学
Two tokens, "清华" and "大学" (the single token is split in two)
清,华,大,学
Four tokens, "清", "华", "大", "学"
清华,大学
大,学
Three tokens, "清华", "大, "学" (results are recursively broken down if rules are found)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lute3_mandarin-0.0.3.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

lute3_mandarin-0.0.3-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file lute3_mandarin-0.0.3.tar.gz.

File metadata

  • Download URL: lute3_mandarin-0.0.3.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.31.0

File hashes

Hashes for lute3_mandarin-0.0.3.tar.gz
Algorithm Hash digest
SHA256 ba7aa2688b4d97b989f879a98f5efc444deaefca38c8e8e3ccb52616c37a1bd5
MD5 64db9e0ef4d07262ce8d08c3953d5659
BLAKE2b-256 3bcb65c61e88a00a461bb5a94252956ddceba6d261438bd8c41e05d93f7edc9b

See more details on using hashes here.

File details

Details for the file lute3_mandarin-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for lute3_mandarin-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5bfeaf7da14aa8cdab1a98f3a1ee4ba93ac6beb95a551d9098e157ebc51f050d
MD5 81b9066e9c00b3e0297dfb5fc4a51bb5
BLAKE2b-256 08c37c0abea77e46f66b9168e61672da0e7679eaf4a072248b17df37bcca3c79

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page