Convert Chinese numerals and Arabic numerals.
Project description
Chinese numerals and Arabic numerals conversion
There should be emojis here (●'◡'●)
Project Description and Description English Version
工程说明与描述中文版本
♠ cnoan
is a toolkit to quickly convert Chinese numbers
and Arabic numbers
! in
cn
refers to Chinese numerals
an
refers specifically to Arabic numerals
o
stands for reciprocal
♥ The Chinese 互
in the interchange is difficult to translate and shorthand /(ㄒoㄒ)/~~
If the first letter of mutual
is used, then the name of this project will feel like 🐎 people
(spread bad speech 🔪, close the small black house❎);
Roughly, it means mutual rotation. The two are connected through c, which can only reflect the meaning of one-way or one-path 👉;
Therefore, o
is used to connect in the middle, which can reflect the concept of ·mutual·
·ring·
to a certain extent;
♦ This project is based on the ideas and guidance of cn2an to update the functions of the problems encountered or existing; Welcome to star and follow, everyone to maintain and improve together;
♣ Hey, it's great ★,°:.☆( ̄▽ ̄)/$:.°★ .
Directory Structure
Catalog Name Function and Description What's New
|--------------------------------------------------------------------------------------------------------
|---an2cn.py Convert Arabic numerals to Chinese numerals Newly defined class names
|--------------------------------------------------------------------------------------------------------
|---base.py This is the base class of the project, None
which contains the base class of ConvertBase
|--------------------------------------------------------------------------------------------------------
|---cn2an.py Convert Chinese numbers to Arabic numbers Newly defined class names
|--------------------------------------------------------------------------------------------------------
|---config.yaml The configuration of the project, Add the abnormal field
mainly the definition of the matching rules
|--------------------------------------------------------------------------------------------------------
|---setup.py Project packaging and publishing Add my information
|--------------------------------------------------------------------------------------------------------
|---translate.py Convert the content in the sentence Modified regular expression
that determines the conversion
With parameters
|--------------------------------------------------------------------------------------------------------
|---utils.py Definition of basic functions in utils.py None
|--------------------------------------------------------------------------------------------------------
|---requirement.txt The package required by the project None
|--------------------------------------------------------------------------------------------------------
Project Function
basic function
1.1 Chinese numbers
=> Arabic numbers
- Support
Chinese numbers
=>Arabic numbers
; - Support
Uppercase Chinese numbers
=>Arabic numbers
; - Support
Chinese numbers and Arabic numbers
=>Arabic numbers
;
1.2 Arabic numbers
=> Chinese numbers
- Support
Arabic numbers
=>Chinese numbers
; - Support
Arabic numbers
=>Uppercase Chinese numbers
; - Support
Arabic numerals
=>Uppercase RMB
;
1.3 Sentence Transformation
-
Support
Chinese numbers
=>Arabic numbers
;- support
date
; - support
score
; - support
percent
; - support
Celsius
;
- support
-
Support
Arabic numbers
=>Chinese numbers
;- support
date
; - support
score
; - support
percent
; - support
Celsius
;
- support
1.4 Others
- support
decimal
; - support
negative numbers
; - Support for
HTTP API
.
Function updates & Fixes
-
🎈 Redefine the field position to be translated (translated) (●'◡'●) The original project (transform+cn2en) will have the following situations
七上八下 --> 7上8下 两人 --> 2人 一旦 --> 1旦
In fact, in practical applications, we do not want it to be converted. Therefore, the premise of conversion is redefined in this project
'Original': self.cn_pattern = f"负?([零一二三四五六七八九十拾百佰千仟万亿]+点)?[零一二三四五六七八九十拾百佰千仟万亿]+"
'Now': self.cn_pattern = f"负?-?正?\+?([零一二三四五六七八九十][\s\t]*[十拾百佰千仟万亿]+)(点[零一二三四五六七八九十]+)?"
Of course, I can't guarantee that this rule can help you solve the corresponding business needs. Therefore, you can redefine it in self.cn_pattern of translate.
-
🎈 Introduced isolation conversion and regression of abnormal words o( ̄▽ ̄)ブ When the above redefinition criteria are defined, situations such as
Wanning
,In case
, andseven up and eight down
can be avoided, but it is still necessary to sigh the vastness and profoundness of Chinese
'E.g':
'一五一十'
...
If the word is thrown in directly, the following results will be obtained:
:return: '一五10'
This doesn't work. In this project, I classify this similar content as abnormal words
, refer to abnormal_words in config
'Thinking':
# encoder
masks = ['1510', ''] # list[str, str, ...] define abnormal words
inputs = str('XXXxxx')
mask_contents = {}
for index, item in enumerate(masks):
if item in inputs:
mask = f'_MASK_{index}_'
mask_contents[mask] = item
inputs = inputs.replace(item, mask)
# decoder
for contents in list(mask_contents.keys()):
if contents in output:
output = output.replace(contents, mask_contents[contents])
- 🎈 Modify a point in the original project
There is a situation in the original engineering quantity: when
两
、甘
,幺
, etc. appear in the text, and are not the content to be converted, when the following demo segment is executed,
inputs = str('XXXXxxx')
inputs = inputs.replace("twenty", "twenty").replace("half", "0.5").replace("two", "2")
These words will be converted in advance, so this project will do the corresponding conversion after regularization judgment.
- 🎈 Do a little of detail We often define an unwritten rule in our daily tasks and writing papers: 10,000 has to be written in the style of 10,000, so this project is also 'forced' to join this rule 😔
After the above series of operations, the final effect is as follows:
from cnoan.translate import Translate
inputs = '这人坏滴很,王尼玛一五一十的收入为一万元, 而两人却告诉我是二千元'
mode = 'cn2an'
tans = Translate()
print(tans.convert("这人坏滴很,王尼玛一五一十的收入为一万元, 而两人却告诉我是二千元", "cn2an"))
# 这人坏滴很,王尼玛一五一十的收入为10,000元, 而两人却告诉我是2,000元
Project Installation & Usage
Install
- Method 1:
pip install cnocn
- Method 2:
git clone https://github.com/zhuofalin/cnoan.git
cd cnoan
python setup.py install
- Method 3:
git clone https://github.com/zhuofalin/cnoan.git copy cnoan to your project
Usage
# import package which you need
import cnoan
# View the current version number
print(cnoan.__version__)
# 0.5.16 # will be different
3.1 Chinese numbers
=> Arabic numerals
The maximum support is
10**16
, that is,Terabillion
and the minimum support is10**-16
.
import cnoan
output = cnoan.cn2an("一百二十三")
# or
output = cnoan.cn2an("一百二十三", "strict")
# output:
# 123
output = cnoan.cn2an("一二三", "normal")
# output:
# 123
output = cnoan.cn2an("1百23", "smart")
# output:
# 123
# The above three modes all support negative numbers
output = cnoan.cn2an("负一百二十三", "strict")
# output:
# -123
# All three modes above support decimals
output = cnoan.cn2an("一点二三", "strict")
# output:
# 1.23
3.2 Arabic numerals
=> Chinese numbers
The maximum support is
10**16
, that is,Terabillion
and the minimum support is10**-16
.
import cnoan
output = cnoan.an2cn("123")
# or
output = cnoan.an2cn("123", "lower")
# output:
# 一百二十三
output = cnoan.an2cn("123", "upper")
# output:
# 壹佰贰拾叁
output = cnoan.an2cn("123", "rmb")
# output:
# 壹佰贰拾叁元整
output = cnoan.an2cn("-123", "lower")
# output:
# 负一百二十三
output = cnoan.an2cn("1.23", "low")
# output:
# 一点二三
3.3 sentence transformation
⚠️:Experimental feature that may cause undesired conversions.
import cnoan
output = cnoan.translate("小王捡了一百块钱")
# or
output = cnoan.translate("小王捡了一百块钱", "cn2an")
# output:
# 小王捡了100块钱
output = cnoan.translate("小王捡了100块钱", "an2cn")
# output:
# 小王捡了一百块钱
## data
output = cnoan.translate("小王的生日是二零零一年三月四日", "cn2an")
# output:
# 小王的生日是2001年3月4日
output = cnoan.translate("小王的生日是2001年3月4日", "an2cn")
# output:
# 小王的生日是二零零一年三月四日
## support score
output = cnoan.translate("抛出去的硬币为正面的概率是二分之一", "cn2an")
# output:
# 抛出去的硬币为正面的概率是1/2
output = cnoan.translate("抛出去的硬币为正面的概率是1/2", "an2cn")
# output:
# 抛出去的硬币为正面的概率是二分之一
## support %
## support ℃
License
communicate
If you have any questions, you can communicate with me through [email] (1822643111@qq.com), and I will reply as soon as possible.
Thanks
- Thunder Bouble: A lot of useful feedback, including some bugs and new features;
- Damon Yu: Added support for full-width numbers and full-width symbols.
Reference
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.