Skip to main content

Learning Using Texts - Chinese Parser

Project description

lute3-mandarin

A Mandarin parser for Lute (lute3) using the jieba library, and pypinyin for readings.

Installation

See the Lute manual.

Usage

When this parser is installed, you can add "Mandarin Chinese" as a language to Lute, which comes with a simple story.

Parsing exceptions

Sometimes jieba groups too many characters together when parsing. For example, it returns "清华大学" as a single word of four characters, which might not be correct.

You can specify how Lute should correct these cases by adding some simple "rules" to the file plugins/lute_mandarin/parser_exceptions.txt found in your Lute data directory. This file is automatically created when Lute starts. Each rule contains the characters of the word as parsed by jieba, with regular commas added where the word should be split.

Some examples:

File content Results when parsing "清华大学"
(empty file) "清华大学"
清华,大学
Two tokens, "清华" and "大学" (the single token is split in two)
清,华,大,学
Four tokens, "清", "华", "大", "学"
清华,大学
大,学
Three tokens, "清华", "大, "学" (results are recursively broken down if rules are found)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lute3_mandarin-0.0.4.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lute3_mandarin-0.0.4-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file lute3_mandarin-0.0.4.tar.gz.

File metadata

  • Download URL: lute3_mandarin-0.0.4.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.33.1

File hashes

Hashes for lute3_mandarin-0.0.4.tar.gz
Algorithm Hash digest
SHA256 9a922b4d4626aea47f56eab16fbb719c35dd9ca2a498f93b1c75d7e9591fe0f7
MD5 a80ae326a8fe80a936b3aaee3e56fd38
BLAKE2b-256 0b2e1651825ecc2ab97351466453200eefcab77252acd25779f194c314fb93b0

See more details on using hashes here.

File details

Details for the file lute3_mandarin-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for lute3_mandarin-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 40c4a1d33b56b17d1c47fdbc56ce2f734ac4b89a374d69a8ff404964368fe73d
MD5 a8d273d53479b844ede1031285a7abc3
BLAKE2b-256 d60ce066fe84e110bc82d5ff040bf912b5a54cda277736e87f80ac3656e0c03a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page