Skip to main content

汉字五笔转换模块/工具

Project description

pywubi — Chinese Character to Wubi Encoding

pypi python_version license

中文文档

A Python library for converting Chinese characters to Wubi (五笔) input method encoding. Currently supports the 86-version scheme with a built-in dictionary of ~21,004 characters.

Features

  • Single-character encoding — convert individual Chinese characters to Wubi codes
  • Phrase encoding — generate codes following Wubi phrase rules (2-char, 3-char, 4+ char)
  • Multi-code query — return all possible encodings for a character
  • Reverse lookup — find characters by Wubi code
  • Fuzzy reverse lookup — use z in place of unknown radicals to guess characters
  • Brief code query — get the shortest code and its level (1st / 2nd / 3rd / full)
  • Mixed text — automatically split Chinese and non-Chinese; punctuation is preserved as-is
  • Zero dependencies — no third-party packages required

Installation

pip install pywubi

Quick Start

from pywubi import wubi

# Character-by-character (default)
wubi('我爱你')
# ['trnt', 'epdc', 'wqiy']

# Return all possible codes
wubi('我爱你', multicode=True)
# [['trnt', 'trn', 'q'], ['epdc', 'epd', 'ep'], ['wqiy', 'wqi', 'wq']]

# Phrase mode
wubi('我爱你', single=False)
# ['tewq']

# Mixed text — punctuation preserved
wubi('天气不错,出去走走!')
# ['gdi', 'rnb', 'gii', 'qajg', ',', 'bmt', 'fcu', 'tfht', 'tfht', '!']

API Reference

wubi(hans, multicode=False, single=True)

Convert a Chinese string to Wubi encodings.

Parameter Type Default Description
hans str Chinese character string
multicode bool False Return all possible codes
single bool True True for char-by-char, False for phrase mode

Returns: list — list of Wubi codes

single_wubi(han, multicode=False)

Convert a single Chinese character to Wubi encoding.

Parameter Type Default Description
han str A single Chinese character
multicode bool False Return all possible codes

Returns: str (single code) or list[str] (multiple codes)

combine_wubi(hans)

Convert a phrase to Wubi encoding.

Parameter Type Description
hans str Chinese phrase

Returns: str — Wubi code for the phrase

Encoding rules:

  • 2-char phrase: first 2 codes of each character (4 codes total)
  • 3-char phrase: 1st code of char 1 & 2 + first 2 codes of char 3 (4 codes total)
  • 4+ char phrase: 1st code of char 1, 2, 3, and last (4 codes total)

lookup(char)

Look up all Wubi codes for a single character.

from pywubi import lookup

lookup('为')   # ['ylyi', 'yly', 'yl', 'o']
lookup('?')    # []

reverse_lookup(code)

Reverse-lookup characters by Wubi code.

from pywubi import reverse_lookup

reverse_lookup('trnt')  # ['我']
reverse_lookup('q')     # ['我']
reverse_lookup('ggll')  # ['一']

fuzzy_reverse_lookup(code, limit=10)

Fuzzy reverse-lookup characters by Wubi code; use z for unknown radical keys.

Wubi 86 only uses keys a-y; z is naturally unused and serves as a wildcard matching any radical key. When the input contains no z, it behaves the same as an exact reverse lookup. Input length determines the matched code length.

Parameter Type Default Description
code str Wubi code; use z/Z for unknown positions
limit int 10 Max results to return; 0 for unlimited

Returns: list[tuple[str, str]][(character, matched_code), ...] sorted by code

from pywubi import fuzzy_reverse_lookup

fuzzy_reverse_lookup('vz')       # [('姑', 'vd'), ('灵', 'vo'), ...]
fuzzy_reverse_lookup('zzzg')     # only last key is 'g', find all 4-code chars ending in g
fuzzy_reverse_lookup('trnt')     # no z — degrades to exact reverse lookup
fuzzy_reverse_lookup('zz', limit=5)  # limit to 5 results

brief_code(char)

Get the shortest (brief) code for a character.

from pywubi import brief_code

brief_code('我')  # 'q'
brief_code('一')  # 'g'
brief_code('?')   # None

brief_level(char)

Get the brief-code level (1 = 1st-level, 2 = 2nd-level, 3 = 3rd-level, 4 = full code).

from pywubi import brief_level

brief_level('我')  # 1
brief_level('一')  # 1
brief_level('〇')  # 4
brief_level('?')   # None

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

Changelog

0.2.0

  • Dictionary storage changed from Python source to JSON — faster loading, smaller size
  • Added lazy-loading: import pywubi no longer loads the full dictionary immediately
  • Added lookup() to query all codes for a character
  • Added reverse_lookup() to find characters by code
  • Added brief_code() to get the shortest code
  • Added brief_level() to get the brief-code level
  • Added comprehensive unit tests

0.1.0

  • Fixed single_seg bug where trailing non-Chinese characters were lost
  • Fixed typos (utlisutils, conbin_wubicombine_wubi)
  • Switched to relative imports within the package
  • Added type hints
  • Added .gitignore, removed .idea/ from tracking
  • Fixed README typos

0.0.2

  • Initial release

PyPI Account Verification

I am the owner of the PyPI account "sfyc23" and the maintainer of this repository: https://github.com/sfyc23/python-wubi

I am currently requesting account recovery for the PyPI project/package "pywubi".

This note is added to help PyPI administrators verify that I still control the source repository associated with the package.

GitHub profile: https://github.com/sfyc23
PyPI project: https://pypi.org/project/pywubi/
Date: 2026-03-30

License

MIT License — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pywubi-0.2.0.tar.gz (134.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pywubi-0.2.0-py3-none-any.whl (131.9 kB view details)

Uploaded Python 3

File details

Details for the file pywubi-0.2.0.tar.gz.

File metadata

  • Download URL: pywubi-0.2.0.tar.gz
  • Upload date:
  • Size: 134.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pywubi-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9221527c9242f88e396a2de0e7d56bbb95d981ae340b43b30f981a888d032c51
MD5 b5903266d7d1acdf4116b7616f7e98d2
BLAKE2b-256 0ff678d337e9f9db96eafb1aa1422a6d16f36e05785b2f918a7995a57af65214

See more details on using hashes here.

File details

Details for the file pywubi-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pywubi-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 131.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pywubi-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a33bd1c300e8ee44a7ae1de3b56833ef01e8a81e3723e91dd44e8eadc829f548
MD5 7ce73e6d38f524e8836126ceb2b3c6b6
BLAKE2b-256 a5e2bcde610417b20a8987c1bd210c981f206a6a8c0bbd88a537a7b47ec96f19

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page