Skip to main content

Input a Chinese character. Output all the variant characters of it.

Project description

Yitizi

Input a Chinese character. Output all the variant characters of it.
輸入一個漢字,輸出它的全部異體字。
输入一个汉字,输出它的全部异体字。

Usage

Python

pip install yitizi
>>> import yitizi
>>> yitizi.get('和')
['咊', '龢']

JavaScript (Node.js)

npm install yitizi
> const Yitizi = require('yitizi');
> Yitizi.get('和');
[ '咊', '龢' ]

JavaScript (browser)

<script src="https://cdn.jsdelivr.net/npm/yitizi@0.1.3"></script>
> Yitizi.get('和');
[ '咊', '龢' ]

JavaScript (browser, ESM)

<script type="module">
  import Yitizi from 'https://esm.run/yitizi@0.1.3';

  Yitizi.get('和'); // => [ '咊', '龢' ]
</script>

Design

Connections between variant characters can be modeled as an graph with characters as vertices, where two characters are variants of each other if they are directly connected by an edge.

To reduce data redundancy, only several types of basic connections are stored in data tables located in data/, from which the full graph yitizi.json is computed by invoking build/main.py.

Basic connections

A basic connection between two variant characters can be classified into one of the three types: equivalent, intersecting, simplification.

  • Equivalent "全等": Two characters are equivalent only if they are interchangable in most texts without change in the meaning. When computing the full graph, it is considered both commutative and transitive, i.e.

    • If A is an equivalent variant of B, then B is an equivalent variant of A;
    • If A is an equivalent variant of B, and B is an equivalent variant of C, then A is an equivalent variant of C.
  • Intersecting "語義交疊": Two characters are intersecting variants if they are interchangable in certain cases. It is also commutative, but not necessarily transitive. Characters with intersecting variants are arranged in groups (rows in data files), with each group having specific meanings shared by its listed characters. A character can belong to multiple groups.

    Example: "閒" has two intersecting variants: "閑" and "間", listed in two groups:

    閒閑  # meaning "vacant"
    閒間  # meaning "in the middle"
    閑>闲  # simplified form (same below)
    間>间
    

    Then in the computed yitizi.json:

    • 閒 and 閑 (闲) are variants of each other;
    • 閒 and 間 (间) are variants of each other;
    • 閑 (闲) and 間 (间) are unrelated.

    Example I-1

    A more complex (though abstract) example:

    =AB  # "=" means equivalent variants
    ACD
    AEFG
    
    • A, B, C and D are variants of one another;
    • A, B, E, F and G are variants of one another;
    • No connections between C (or D) and E (or F/G).

    Example I-2

  • Simplification "簡體": A non-transitive and asymmetric connection. A simplified character is associated only with its traditional form.

    Example 1: "么" is 1) a simplified form of "麼", 2) an equivalent variant of "幺"; "麼" has an equivalent variant "麽", then:

    • 麼, 麽 and 么 are variants of one another;
    • 幺 and 么 are variants of each other;
    • 麼 or 麽 is unrelated to 幺.

    Example S-1

    Example 2: "苧" is 1) a simplified form of "薴", 2) a traditional form of "苎", then:

    • 苧 is a variant of 薴 and 苎;
    • 薴 and 苎 are unrelated.

    Example S-2

    Example 3: "芸" is a simplified form of "藝" (Japanese Shinjitai) and "蕓" (Chinese), and "艺" is also a simplified form of "藝" (Chinese), then:

    • 藝, 芸 and 艺 are variants of one another;
    • 蕓 and 芸 are variants of each other;
    • 藝 or 艺 is unrelated to 蕓.

    Example S-3

Data source

Note for developers

You need to substitute all the occurrences of the version string before publishing a new release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yitizi-0.1.3.tar.gz (84.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yitizi-0.1.3-py3-none-any.whl (80.9 kB view details)

Uploaded Python 3

File details

Details for the file yitizi-0.1.3.tar.gz.

File metadata

  • Download URL: yitizi-0.1.3.tar.gz
  • Upload date:
  • Size: 84.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for yitizi-0.1.3.tar.gz
Algorithm Hash digest
SHA256 d4ad4285b7a5d835e77168b6b85a3e686f8619102a0b8b534792111c0e42fb53
MD5 85ccfac033c8b898c329fe61bb266040
BLAKE2b-256 8ff14972147e7cbfacfb11c588053afc56b5d5aeb32d8ceeefbbe5833173eb9d

See more details on using hashes here.

File details

Details for the file yitizi-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: yitizi-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 80.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for yitizi-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 800c111515ab7a58351ea44c8c10e6d87afac2b9bd704e6f3a70a877b35cbe61
MD5 27985398da1a5a108421e278f8ec51c0
BLAKE2b-256 a89d35203545b06688c665850851e03d7b111dfcc96b1a7a0ce6e8ecc4dbc0e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page