Convert Chinese numerals and Arabic numerals.
Project description
Chinese numerals and Arabic numerals conversion
There should be emojis here (●'◡'●)
Project Description and Description English Version 工程说明与描述中文版本
♠ cnoan
is a toolkit to quickly convert Chinese numbers
and Arabic numbers
! in
cn
refers to Chinese numerals
an
refers specifically to Arabic numerals
o
stands for reciprocal
♥ The mutual
in the interchange is difficult to translate /(ㄒoㄒ)/~~
If the first letter of mutual
is used, then the name of this project will feel like 🐎 people (spread bad speech 🔪, close the small black house❎);
Roughly, it means mutual rotation. The two are connected through c, which can only reflect the meaning of one-way 👉;
Therefore, o
is used to connect in the middle, which can reflect the concept of ·mutual·
·ring·
to a certain extent;
♦ This project is based on the ideas and guidance of cn2an to update the functions of the problems encountered or existing; Welcome to star and follow, everyone to maintain and improve together;
♣ Hey, it's great ★,°:.☆( ̄▽ ̄)/$:.°★ .
Directory Structure
Catalog Name Function and Description What's New
|---an2cn.py Convert Arabic numerals to Chinese numerals Newly defined class names
|---base.py This is the base class of the project, which contains the base class of ConvertBase None
|---cn2an.py Convert Chinese numbers to Arabic numbers Newly defined class names
|---config.yaml The configuration of the project, mainly the definition of the matching rules Add the abnormal field
|---setup.py Project packaging, packaging and publishing Add my information
|---translate.py Convert the content of the sentence to determine the conversion, use the abnormal field
With parameters Modified regular expression
|---Auxiliary Definition of basic functions in utils.py None
|---requirement.txt The package required by the project None
Project features
basic function
1.1 Chinese numbers
=> Arabic numbers
- Support
Chinese numbers
=>Arabic numbers
; - Support
Uppercase Chinese numbers
=>Arabic numbers
; - Support
Chinese numbers and Arabic numbers
=>Arabic numbers
;
1.2 Arabic numbers
=> Chinese numbers
- Support
Arabic numbers
=>Chinese numbers
; - Support
Arabic numbers
=>Uppercase Chinese numbers
; - Support
Arabic numerals
=>Uppercase RMB
;
1.3 Sentence Transformation
-
Support
Chinese numbers
=>Arabic numbers
;- support
date
; - support
score
; - support
percent
; - support
Celsius
;
- support
-
Support
Arabic numbers
=>Chinese numbers
;- support
date
; - support
score
; - support
percent
; - support
Celsius
;
- support
1.4 Others
- support
decimal
; - support
negative numbers
; - Support for
HTTP API
.
Feature updates and fixes
-
🎈 Redefine the field position to be translated (translated) (●'◡'●) The original project (transform+cn2en) will have the following situations
Seven up and eight down --> 7 up and eight down 2 people --> 2 people
In fact, in practical applications, we do not want it to be converted. Therefore, the premise of conversion is redefined in this project
'raw': self.cn_pattern = f" negative trillion]+"
'Now': self.cn_pattern = f"negative?-?positive?\+?([01234567890][\s\t]*[1000000000 trillion]+)(dot [01234567890]+)?"
Of course, I can't guarantee that this rule can help you solve the corresponding business needs. Therefore, you can redefine it in self.cn_pattern of translate.
-
🎈 Introduced isolation conversion and regression of abnormal words o( ̄▽ ̄)ブ <<<<<<< HEAD When the above redefinition criteria are defined, situations such as
Wanning
,In case
, andseven up and eight down
can be avoided, but it is still necessary to sigh the vastness and profoundness of Chinese
When the above redefinition criteria are defined, situations such as Wanning
, In case
, and seven up and eight down
can be avoided, but it is still necessary to sigh the vastness and profoundness of Chinese
307cd2b822ddce624506eb5f7d53b04b123f8bad
'E.g':
'One Five Ten'
...
If the word is thrown in directly, the following results will be obtained:
:return: 'One five 10'
This doesn't work. In this project, I classify this similar content as abnormal words
, refer to abnormal_words in config
'Thinking':
# encoder
masks = ['1510', ''] # list[str, str, ...] define abnormal words
inputs = str('XXXxxx')
mask_contents = {}
for index, item in enumerate(masks):
if item in inputs:
mask = f'_MASK_{index}_'
mask_contents[mask] = item
inputs = inputs.replace(item, mask)
# decoder
for contents in list(mask_contents.keys()):
if contents in output:
output = output.replace(contents, mask_contents[contents])
- 🎈 Modify a point in the original project
There is a situation in the original engineering quantity: when
liang
,gan
,yi
, etc. appear in the text, and are not the content to be converted, when the following demo segment is executed,
inputs = str('XXXXxxx')
inputs = inputs.replace("twenty", "twenty").replace("half", "0.5").replace("two", "2")
These words will be converted in advance, so this project will do the corresponding conversion after regularization judgment.
- 🎈 Do a little bit of detail We often define an unwritten rule in our daily tasks and writing papers: 10,000 has to be written in the style of 10,000, so this project is also 'forced' to join this rule 😔
After the above series of operations, the final effect is as follows:
from cnoan.translate import Translate
inputs = '这人坏滴很,王尼玛一五一十的收入为一万元, 而两人却告诉我是二千元'
mode = 'cn2an'
tans = Translate()
print(tans.convert("这人坏滴很,王尼玛一五一十的收入为一万元, 而两人却告诉我是二千元", "cn2an"))
# 这人坏滴很,王尼玛一五一十的收入为10,000元, 而两人却告诉我是2,000元
Project installation and usage
Install
- Method 1:
pip install cnocn
- Method 2:
git clone https://github.com/zhuofalin/cnoan.git
cd cnoan
python setup.py install
- Method 3:
git clone https://github.com/zhuofalin/cnoan.git copy cnoan to your project
Usage
# 在文件首部引入包
import cnoan
# 查看当前版本号
print(cnoan.__version__)
# 0.5.16
3.1 Chinese numbers
=> Arabic numerals
最大支持到
10**16
,即千万亿
,最小支持到10**-16
。
import cnoan
# 在 strict 模式(默认)下,只有严格符合数字拼写的才可以进行转化
output = cnoan.cn2an("一百二十三")
# 或者
output = cnoan.cn2an("一百二十三", "strict")
# output:
# 123
# 在 normal 模式下,可以将 一二三 进行转化
output = cnoan.cn2an("一二三", "normal")
# output:
# 123
# 在 smart 模式下,可以将混合拼写的 1百23 进行转化
output = cnoan.cn2an("1百23", "smart")
# output:
# 123
# 以上三种模式均支持负数
output = cnoan.cn2an("负一百二十三", "strict")
# output:
# -123
# 以上三种模式均支持小数
output = cnoan.cn2an("一点二三", "strict")
# output:
# 1.23
3.2 Arabic numerals
=> Chinese numbers
最大支持到
10**16
,即千万亿
,最小支持到10**-16
。
import cnoan
# 在 low 模式(默认)下,数字转化为小写的Chinese numbers
output = cnoan.an2cn("123")
# 或者
output = cnoan.an2cn("123", "low")
# output:
# 一百二十三
# 在 up 模式下,数字转化为大写的Chinese numbers
output = cnoan.an2cn("123", "up")
# output:
# 壹佰贰拾叁
# 在 rmb 模式下,数字转化为人民币专用的描述
output = cnoan.an2cn("123", "rmb")
# output:
# 壹佰贰拾叁元整
# 以上三种模式均支持负数
output = cnoan.an2cn("-123", "low")
# output:
# 负一百二十三
# 以上三种模式均支持小数
output = cnoan.an2cn("1.23", "low")
# output:
# 一点二三
3.3 句子转化
⚠️:Experimental feature that may cause undesired conversions.
import cnoan
# 在 cn2an 方法(默认)下,可以将句子中的Chinese numbers转成Arabic numerals
output = cnoan.translate("小王捡了一百块钱")
# 或者
output = cnoan.translate("小王捡了一百块钱", "cn2an")
# output:
# 小王捡了100块钱
# 在 an2cn 方法下,可以将句子中的Chinese numbers转成Arabic numerals
output = cnoan.translate("小王捡了100块钱", "an2cn")
# output:
# 小王捡了一百块钱
## 支持日期
output = cnoan.translate("小王的生日是二零零一年三月四日", "cn2an")
# output:
# 小王的生日是2001年3月4日
output = cnoan.translate("小王的生日是2001年3月4日", "an2cn")
# output:
# 小王的生日是二零零一年三月四日
## 支持分数
output = cnoan.translate("抛出去的硬币为正面的概率是二分之一", "cn2an")
# output:
# 抛出去的硬币为正面的概率是1/2
output = cnoan.translate("抛出去的硬币为正面的概率是1/2", "an2cn")
# output:
# 抛出去的硬币为正面的概率是二分之一
## 支持百分比
## 支持摄氏度
License
communicate
If you have any questions, you can communicate with me through [email] (1822643111@qq.com), and I will reply as soon as possible.
Thanks
- Thunder Bouble: A lot of useful feedback, including some bugs and new features;
- Damon Yu: Added support for full-width numbers and full-width symbols.
Reference
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file cnoan-1.1.2-py3.9.egg
.
File metadata
- Download URL: cnoan-1.1.2-py3.9.egg
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d98fe5b28190931d368d16baec925a36e390cf922ba0c847866e4f9e27df234 |
|
MD5 | b3712eb9fe9912a74620e72dee9d5652 |
|
BLAKE2b-256 | 3fd137348a402bf71346caa347837a2f92acaa499784bccd1eca33444d60854b |