Learning Using Texts - Chinese Parser
Project description
lute3-mandarin
A Mandarin parser for Lute (lute3
) using the jieba
library, and
pypinyin
for readings.
Installation
See the Lute manual.
Usage
When this parser is installed, you can add "Mandarin Chinese" as a language to Lute, which comes with a simple story.
Parsing exceptions
Sometimes jieba
groups too many characters together when parsing.
For example, it returns "清华大学" as a single word of four
characters, which might not be correct.
You can specify how Lute should correct these cases by adding some
simple "rules" to the file
plugins/lute_mandarin/parser_exceptions.txt
found in your Lute
data
directory. This file is automatically created when Lute
starts. Each rule contains the characters of the word as parsed by
jieba
, with regular commas added where the word should be split.
Some examples:
File content | Results when parsing "清华大学" |
---|---|
(empty file) | "清华大学" |
|
Two tokens, "清华" and "大学" (the single token is split in two) |
|
Four tokens, "清", "华", "大", "学" |
|
Three tokens, "清华", "大, "学" (results are recursively broken down if rules are found) |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file lute3_mandarin-0.0.3.tar.gz
.
File metadata
- Download URL: lute3_mandarin-0.0.3.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.31.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba7aa2688b4d97b989f879a98f5efc444deaefca38c8e8e3ccb52616c37a1bd5 |
|
MD5 | 64db9e0ef4d07262ce8d08c3953d5659 |
|
BLAKE2b-256 | 3bcb65c61e88a00a461bb5a94252956ddceba6d261438bd8c41e05d93f7edc9b |
File details
Details for the file lute3_mandarin-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: lute3_mandarin-0.0.3-py3-none-any.whl
- Upload date:
- Size: 4.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.31.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5bfeaf7da14aa8cdab1a98f3a1ee4ba93ac6beb95a551d9098e157ebc51f050d |
|
MD5 | 81b9066e9c00b3e0297dfb5fc4a51bb5 |
|
BLAKE2b-256 | 08c37c0abea77e46f66b9168e61672da0e7679eaf4a072248b17df37bcca3c79 |