Skip to main content

Unicode To CP932 Transcoder

Project description

UCP9 - Unicode To CP932 Transcoder

A small python package which helps transcode cp932-incompatible kanji characters to their cp932-compatible equivalents

This module provides a transcoding service:

  • FROM: An arbitrary unicode character.
  • TO: A cp932-compatible, semantically similar but differently encoded version of the same character.

Usage:

    import ucp9
    ucp9.convert(string, option)

[string]: a string that contains cp932-incompatible characters

[option]: Option to handle cp932-inconvertible characters.

  • "keep": keep the cp932-inconvertible characters. NOTE: the return string WON'T be cp932-compatible.
  • "remove": remove the cp932-inconvertible characters.
  • "replace": (Default behaviour) replace the cp932-inconvertible characters with "?"

Currently supported unicode character blocks:

  • Kangxi Radicals
  • Print Standard Character
  • Old type
  • CJK Radicals Supplement
  • Katakana Phonetic Extensions
  • CJK Unified Ideographs
  • CJK Compatibility Ideographs
  • CJK Compatibility Ideographs Supplements

Note:

  • cp932-incompatible: characters which cannot encode to cp932 using string.encode(), but could potentially have equivalent cp932-encodable versions of themselves.
  • cp932-inconvertible: characters which cannot encode to cp932, and doesn't have a cp932-encodable version.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ucp9-1.0.1.tar.gz (13.8 kB view hashes)

Uploaded Source

Built Distribution

ucp9-1.0.1-py3-none-any.whl (13.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page