Skip to main content

Input a Chinese character. Output all the variant characters of it.

Project description

Yitizi

Input a Chinese character. Output all the variant characters of it.
輸入一個漢字,輸出它的全部異體字。
输入一个汉字,输出它的全部异体字。

Usage

Python

pip install yitizi
>>> import yitizi
>>> yitizi.get('和')
['咊', '龢']

JavaScript (Node.js)

npm install yitizi
> const Yitizi = require('yitizi');
> Yitizi.get('和');
[ '咊', '龢' ]

JavaScript (browser)

<script src="https://cdn.jsdelivr.net/npm/yitizi@0.1.2"></script>
> Yitizi.get('和');
[ '咊', '龢' ]

Design

Connections between variant characters can be modeled as an graph with characters as vertices, where two characters are variants of each other if they are directly connected by an edge.

To reduce data redundancy, only several types of basic connections are stored in data tables located in data/, from which the full graph yitizi.json is computed by invoking build/main.py.

Basic connections

A basic connection between two variant characters can be classified into one of the three types: equivalent, intersecting, simplification.

  • Equivalent "全等": Two characters are equivalent only if they are interchangable in most texts without change in the meaning. When computing the full graph, it is considered both commutative and transitive, i.e.

    • If A is an equivalent variant of B, then B is an equivalent variant of A;
    • If A is an equivalent variant of B, and B is an equivalent variant of C, then A is an equivalent variant of C.
  • Intersecting "語義交疊": Two characters are intersecting variants if they are interchangable in certain cases. It is also commutative, but not necessarily transitive. Characters with intersecting variants are arranged in groups (rows in data files), with each group having specific meanings shared by its listed characters. A character can belong to multiple groups.

    Example: "閒" has two intersecting variants: "閑" and "間", listed in two groups:

    閒閑  # meaning "vacant"
    閒間  # meaning "in the middle"
    閑>闲  # simplified form (same below)
    間>间
    

    Then in the computed yitizi.json:

    • 閒 and 閑 (闲) are variants of each other;
    • 閒 and 間 (间) are variants of each other;
    • 閑 (闲) and 間 (间) are unrelated.

    Example I-1

    A more complex (though abstract) example:

    =AB  # "=" means equivalent variants
    ACD
    AEFG
    
    • A, B, C and D are variants of one another;
    • A, B, E, F and G are variants of one another;
    • No connections between C (or D) and E (or F/G).

    Example I-2

  • Simplification "簡體": A non-transitive and asymmetric connection. A simplified character is associated only with its traditional form.

    Example 1: "么" is 1) a simplified form of "麼", 2) an equivalent variant of "幺"; "麼" has an equivalent variant "麽", then:

    • 麼, 麽 and 么 are variants of one another;
    • 幺 and 么 are variants of each other;
    • 麼 or 麽 is unrelated to 幺.

    Example S-1

    Example 2: "苧" is 1) a simplified form of "薴", 2) a traditional form of "苎", then:

    • 苧 is a variant of 薴 and 苎;
    • 薴 and 苎 are unrelated.

    Example S-2

    Example 3: "芸" is a simplified form of "藝" (Japanese Shinjitai) and "蕓" (Chinese), and "艺" is also a simplified form of "藝" (Chinese), then:

    • 藝, 芸 and 艺 are variants of one another;
    • 蕓 and 芸 are variants of each other;
    • 藝 or 艺 is unrelated to 蕓.

    Example S-3

Data source

Note for developers

You need to substitute all the occurrences of the version string before publishing a new release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yitizi-0.1.2.tar.gz (84.5 kB view details)

Uploaded Source

Built Distribution

yitizi-0.1.2-py3-none-any.whl (80.8 kB view details)

Uploaded Python 3

File details

Details for the file yitizi-0.1.2.tar.gz.

File metadata

  • Download URL: yitizi-0.1.2.tar.gz
  • Upload date:
  • Size: 84.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for yitizi-0.1.2.tar.gz
Algorithm Hash digest
SHA256 463efa8240736e5548dbda83cff6c11e1fed97778af18f8847355133bbaee922
MD5 34c360afd68760967c2d20298f60e55e
BLAKE2b-256 2d76d7e2090c1e381f75c3b0b73d53ffbf237c9bf80ed5000d269ecd37d1cfea

See more details on using hashes here.

File details

Details for the file yitizi-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: yitizi-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 80.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for yitizi-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 78478facf94daadeef0fdc4fde852b9d8fc46fb8daad55fb89a2662fed8302c9
MD5 8f5ff516a6c5a40d753fdbabb92afdd2
BLAKE2b-256 3f3356f7eb8096aee8358786078b4044e0b3ee9d7530694e4b8b63814d90e23a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page