Skip to main content

Convert Chinese numerals and Arabic numerals.

Project description

Chinese numerals and Arabic numerals conversion

There should be emojis here (●'◡'●)

Project Description and Description English Version
工程说明与描述中文版本

cnoan is a toolkit to quickly convert Chinese numbers and Arabic numbers! in
cn refers to Chinese numerals
an refers specifically to Arabic numerals
o stands for reciprocal

♥ The Chinese in the interchange is difficult to translate and shorthand /(ㄒoㄒ)/~~ If the first letter of mutual is used, then the name of this project will feel like 🐎 people
(spread bad speech 🔪, close the small black house❎);
Roughly, it means mutual rotation. The two are connected through c, which can only reflect the meaning of one-way or one-path 👉;
Therefore, o is used to connect in the middle, which can reflect the concept of ·mutual· ·ring· to a certain extent;

♦ This project is based on the ideas and guidance of cn2an to update the functions of the problems encountered or existing; Welcome to star and follow, everyone to maintain and improve together;

♣ Hey, it's great ★,°:.☆( ̄▽ ̄)/$:.°★ .


Directory Structure

   Catalog Name                 Function and Description                             What's New
|--------------------------------------------------------------------------------------------------------
|---an2cn.py         Convert Arabic numerals to Chinese numerals               Newly defined class names
|--------------------------------------------------------------------------------------------------------
|---base.py          This is the base class of the project,                             None
                     which contains the base class of ConvertBase 
|--------------------------------------------------------------------------------------------------------
|---cn2an.py         Convert Chinese numbers to Arabic numbers                 Newly defined class names
|--------------------------------------------------------------------------------------------------------
|---config.yaml      The configuration of the project,                         Add the abnormal field
                     mainly the definition of the matching rules 
|--------------------------------------------------------------------------------------------------------
|---setup.py         Project packaging and publishing                          Add my information
|--------------------------------------------------------------------------------------------------------
|---translate.py     Convert the content in the sentence                       Modified regular expression
                     that determines the conversion
                     With parameters
|--------------------------------------------------------------------------------------------------------
|---utils.py         Definition of basic functions in utils.py                          None
|--------------------------------------------------------------------------------------------------------
|---requirement.txt  The package required by the project                                None
|--------------------------------------------------------------------------------------------------------

Project Function

basic function

1.1 Chinese numbers => Arabic numbers

  • Support Chinese numbers => Arabic numbers;
  • Support Uppercase Chinese numbers => Arabic numbers;
  • Support Chinese numbers and Arabic numbers => Arabic numbers;

1.2 Arabic numbers => Chinese numbers

  • Support Arabic numbers => Chinese numbers;
  • Support Arabic numbers => Uppercase Chinese numbers;
  • Support Arabic numerals => Uppercase RMB;

1.3 Sentence Transformation

  • Support Chinese numbers => Arabic numbers;

    • support date;
    • support score;
    • support percent;
    • support Celsius;
  • Support Arabic numbers => Chinese numbers;

    • support date;
    • support score;
    • support percent;
    • support Celsius;

1.4 Others

  • support decimal;
  • support negative numbers;
  • Support for HTTP API.

Function updates & Fixes

  • 🎈 Redefine the field position to be translated (translated) (●'◡'●) The original project (transform+cn2en) will have the following situations

    七上八下 --> 7上8下
    两人    --> 2人
    一旦    --> 1旦
    

    In fact, in practical applications, we do not want it to be converted. Therefore, the premise of conversion is redefined in this project

     'Original': 
         self.cn_pattern = f"负?([零一二三四五六七八九十拾百佰千仟万亿]+点)?[零一二三四五六七八九十拾百佰千仟万亿]+"
    
    'Now':
          self.cn_pattern = f"负?-?正?\+?([零一二三四五六七八九十][\s\t]*[十拾百佰千仟万亿]+)(点[零一二三四五六七八九十]+)?"
    

    Of course, I can't guarantee that this rule can help you solve the corresponding business needs. Therefore, you can redefine it in self.cn_pattern of translate.

  • 🎈 Introduced isolation conversion and regression of abnormal words o( ̄▽ ̄)ブ When the above redefinition criteria are defined, situations such as Wanning, In case, and seven up and eight down can be avoided, but it is still necessary to sigh the vastness and profoundness of Chinese

1

'E.g':
    '一五一十'
...

If the word is thrown in directly, the following results will be obtained:

:return: '一五10'

This doesn't work. In this project, I classify this similar content as abnormal words, refer to abnormal_words in config

'Thinking':
# encoder
  masks = ['1510', ''] # list[str, str, ...] define abnormal words
  inputs = str('XXXxxx')
  mask_contents = {}
  for index, item in enumerate(masks):
      if item in inputs:
          mask = f'_MASK_{index}_'
          mask_contents[mask] = item
          inputs = inputs.replace(item, mask)
# decoder
  for contents in list(mask_contents.keys()):
      if contents in output:
          output = output.replace(contents, mask_contents[contents])
  • 🎈 Modify a point in the original project There is a situation in the original engineering quantity: when , etc. appear in the text, and are not the content to be converted, when the following demo segment is executed,
  inputs = str('XXXXxxx')
  inputs = inputs.replace("twenty", "twenty").replace("half", "0.5").replace("two", "2")

These words will be converted in advance, so this project will do the corresponding conversion after regularization judgment.

  • 🎈 Do a little of detail We often define an unwritten rule in our daily tasks and writing papers: 10,000 has to be written in the style of 10,000, so this project is also 'forced' to join this rule 😔

After the above series of operations, the final effect is as follows:

from cnoan.translate import Translate
inputs = '这人坏滴很,王尼玛一五一十的收入为一万元, 而两人却告诉我是二千元'
mode = 'cn2an'
tans = Translate()
print(tans.convert("这人坏滴很,王尼玛一五一十的收入为一万元, 而两人却告诉我是二千元", "cn2an"))
# 这人坏滴很,王尼玛一五一十的收入为10,000元, 而两人却告诉我是2,000元

Project Installation & Usage

Install

  • Method 1:
    pip install cnocn
    
  • Method 2:
git clone https://github.com/zhuofalin/cnoan.git
cd cnoan
python setup.py install
  • Method 3:
    git clone https://github.com/zhuofalin/cnoan.git
    copy cnoan to your project
    

Usage

# import package which you need
import cnoan

# View the current version number
print(cnoan.__version__)
# 0.5.16  #  will be different

3.1 Chinese numbers => Arabic numerals

The maximum support is 10**16, that is, Terabillion
and the minimum support is 10**-16.

import cnoan

output = cnoan.cn2an("一百二十三")
# or
output = cnoan.cn2an("一百二十三", "strict")
# output:
# 123

output = cnoan.cn2an("一二三", "normal")
# output:
# 123

output = cnoan.cn2an("1百23", "smart")
# output:
# 123

# The above three modes all support negative numbers
output = cnoan.cn2an("负一百二十三", "strict")
# output:
# -123

# All three modes above support decimals
output = cnoan.cn2an("一点二三", "strict")
# output:
# 1.23

3.2 Arabic numerals => Chinese numbers

The maximum support is 10**16, that is, Terabillion
and the minimum support is 10**-16.

import cnoan

output = cnoan.an2cn("123")
# or
output = cnoan.an2cn("123", "lower")
# output:
# 一百二十三

output = cnoan.an2cn("123", "upper")
# output:
# 壹佰贰拾叁

output = cnoan.an2cn("123", "rmb")
# output:
# 壹佰贰拾叁元整

output = cnoan.an2cn("-123", "lower")
# output:
# 负一百二十三

output = cnoan.an2cn("1.23", "low")
# output:
# 一点二三

3.3 sentence transformation

⚠️:Experimental feature that may cause undesired conversions.

import cnoan

output = cnoan.translate("小王捡了一百块钱")
# or
output = cnoan.translate("小王捡了一百块钱", "cn2an")
# output:
# 小王捡了100块钱

output = cnoan.translate("小王捡了100块钱", "an2cn")
# output:
# 小王捡了一百块钱


## data
output = cnoan.translate("小王的生日是二零零一年三月四日", "cn2an")
# output:
# 小王的生日是2001年3月4日

output = cnoan.translate("小王的生日是2001年3月4日", "an2cn")
# output:
# 小王的生日是二零零一年三月四日

## support score
output = cnoan.translate("抛出去的硬币为正面的概率是二分之一", "cn2an")
# output:
# 抛出去的硬币为正面的概率是1/2

output = cnoan.translate("抛出去的硬币为正面的概率是1/2", "an2cn")
# output:
# 抛出去的硬币为正面的概率是二分之一

## support %
## support ℃

License


communicate

If you have any questions, you can communicate with me through [email] (1822643111@qq.com), and I will reply as soon as possible.


Thanks

  • Thunder Bouble: A lot of useful feedback, including some bugs and new features;
  • Damon Yu: Added support for full-width numbers and full-width symbols.

Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

cnoan-1.1.8-py3.9.egg (5.4 kB view details)

Uploaded Source

File details

Details for the file cnoan-1.1.8-py3.9.egg.

File metadata

  • Download URL: cnoan-1.1.8-py3.9.egg
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.7

File hashes

Hashes for cnoan-1.1.8-py3.9.egg
Algorithm Hash digest
SHA256 7a4b633b38bcc66b6012da2393ce66d855cc0c35ed2199ad270c4bd85fbebbe5
MD5 cdbe84c7a835d67855053c06e80f9f29
BLAKE2b-256 c2f6c9f110b645b9272918874495ccb99ea3f72bd1c9efb3e176b5e361a5ae46

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page