Converting number formats
Project description
numericnormalizer
This is a basic library used for NLP that can perform conversions between numbers in numerical format and alphabetical / character format.
Installation
pip install numericnormalizer
Usage
importing the module
from numericnormalizer import normalizer
Convert a number to a word (i.e. 5 -> 'five')
normalizer.number_to_word(5, lang='en')
>> "five"
normalizer.number_to_word(5, lang='zh')
>> "五"
Convert a word to a number (i.e. 'five' -> 5)
normalizer.word_to_number('five', lang='en')
>> 5
normalizer.number_to_word('五', lang='zh')
>> 5
Format numbers in a sentence
Example 1: default formatting
normalizer.format_sentence(
sentence='What are the 6 principles of intercultural adaption?',
lang='zh'
)
>> "What are the six (6) principles of intercultural adaption?"
Example 2: Custom Formatting
normalizer.format_sentence(
sentence='I have 4 apples and five oranges.',
lang='zh',
formatting='{number} [{word}]', # custom formatting
)
>> "I have 4 [four] apples and 5 [five] oranges."
Example 3: Number restricting
normalizer.format_sentence(
sentence='I have 4 apples and five oranges.',
lang='zh',
max_number=4 # restrict the max_number
)
>> "I have four (4) apples and five oranges."
Language Support
The supported languages are from the Azure Language Detect List:
- Afrikaans (af)
- Albanian (sq)
- Amharic (am)
- Arabic (ar)
- Armenian (hy)
- Assamese (as)
- Azerbaijani (az)
- Bashkir (ba)
- Basque (eu)
- Belarusian (be)
- Bengali (bn)
- Bosnian (bs)
- Bulgarian (bg)
- Burmese (my)
- Catalan (ca)
- Central Khmer (km)
- Chinese (zh)
- Chinese Simplified (zh_chs)
- Chinese Traditional (zh_cht)
- Chuvash (cv)
- Corsican (co)
- Croatian (hr)
- Czech (cs)
- Danish (da)
- Dari (prs)
- Divehi (dv)
- Dutch (nl)
- English (en)
- Esperanto (eo)
- Estonian (et)
- Faroese (fo)
- Fijian (fj)
- Finnish (fi)
- French (fr)
- Galician (gl)
- Georgian (ka)
- German (de)
- Greek (el)
- Gujarati (gu)
- Haitian (ht)
- Hausa (ha)
- Hebrew (he)
- Hindi (hi)
- Hmong Daw (mww)
- Hungarian (hu)
- Icelandic (is)
- Igbo (ig)
- Indonesian (id)
- Inuktitut (iu)
- Irish (ga)
- Italian (it)
- Japanese (ja)
- Javanese (jv)
- Kannada (kn)
- Kazakh (kk)
- Kinyarwanda (rw)
- Kirghiz (ky)
- Korean (ko)
- Kurdish (ku)
- Lao (lo)
- Latin (la)
- Latvian (lv)
- Lithuanian (lt)
- Luxembourgish (lb)
- Macedonian (mk)
- Malagasy (mg)
- Malay (ms)
- Malayalam (ml)
- Maltese (mt)
- Maori (mi)
- Marathi (mr)
- Mongolian (mn)
- Nepali (ne)
- Norwegian (no)
- Norwegian Nynorsk (nn)
- Odia (or)
- Pasht (ps)
- Persian (fa)
- Polish (pl)
- Portuguese (pt)
- Punjabi (pa)
- Queretaro Otomi (otq)
- Romanian (ro)
- Russian (ru)
- Samoan (sm)
- Serbian (sr)
- Shona (sn)
- Sindhi (sd)
- Sinhala (si)
- Slovak (sk)
- Slovenian (sl)
- Somali (so)
- Spanish (es)
- Sundanese (su)
- Swahili (sw)
- Swedish (sv)
- Tagalog (tl)
- Tahitian (ty)
- Tajik (tg)
- Tamil (ta)
- Tatar (tt)
- Telugu (te)
- Thai (th)
- Tibetan (bo)
- Tigrinya (ti)
- Tongan (to)
- Turkish (tr)
- Turkmen (tk)
- Upper Sorbian (hsb)
- Uyghur (ug)
- Ukrainian (uk)
- Urdu (ur)
- Uzbek (uz)
- Vietnamese (vi)
- Welsh (cy)
- Xhosa (xh)
- Yiddish (yi)
- Yoruba (yo)
- Yucatec Maya (yua)
- Zulu (zu)
However for the format_sentence
feature, as this is an early release not all languages have been tested thoroughly. It is currently designed to only check languages that deal with spaces as it relies on regex word match notation
Number support
Currently only support numbers 0 - 10. No negatives.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file numericnormalizer-0.1.0.tar.gz
.
File metadata
- Download URL: numericnormalizer-0.1.0.tar.gz
- Upload date:
- Size: 14.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 269df4cdd8b15493ac81a3e34f63ea8c8db4d359b598c9cbbe433e2e4a3bd373 |
|
MD5 | e20589ec00d1d6962c733ee0e6f05024 |
|
BLAKE2b-256 | 3f29e1ce5c21264ff5e7e811699502cae8fc7104eed62efe63329af4f4a37e63 |
File details
Details for the file numericnormalizer-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: numericnormalizer-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 712370a059bb3338a4f8f83590f759ba6b2367e8fbee8a68a414273666c61ea5 |
|
MD5 | bac6b269f9cf3543e7ff67ae4509d992 |
|
BLAKE2b-256 | aa5b2e4298c89cd73ae3c4a089a21fa855880e2b503a95a51870cf50314280e5 |