Bear Au Jus Text (btext) is a tool used for processing a text/string, optimized for data science and data analytics. btext can also implemented with pandas dataframe.
Project description
BTEXT
Bear Au Jus Text (btext) is a tool used for processing a text/string, optimized for data science and data analytics.
btext can also implemented with pandas dataframe.
- Latest Version Github Directory 1.0
- PyPI https://pypi.org/project/btext/
- Webpages https://haryobagas.github.io/btext/
Latest Changelog
Release Date : 12/23/2020 Version 1.0
- Initial Commit
Installation
Get latest version of bearaujus
pip install btext --upgrade
Documentation
A. Core Module
Core module contains base functions of base text processing from bearaujus.
import btext as bt
Core Module : Table of Contens
- A.1. Converting Text to Consecutive Letter
- A.2. Converting Text to Tokenized Consecutive Letter
- A.3. Converting Text to Consecutive Number
- A.4. Converting Text to Tokenized Consecutive Number
- A.5. Converting Text to Consecutive Punctuation
- A.6. Converting Text to Tokenized Consecutive Punctuation
- A.7. Converting Text to Lower Case
- A.8. Removing Spaces
- A.9. Removing Double Spaces
- A.10. Removing Char by User Option
- A.11. Removing Char by User Desired Length
- A.12. Get All Valid Alphabet
- A.13. Get Tokenized All Valid Alphabet
- A.14. Get All Valid Number
- A.15. Get Tokenized All Valid Number
- A.16. Get All Valid Punctuation
- A.17. Get Tokenized All Valid Punctuation
- A.18. Normalizing a Text or a Collections
- A.19. Converting Object to String
A.1. Converting Text to Consecutive Letter
def conslet(val, sep=' ')
- Example 1
text = '=,= im getting hungry~'
text = bt.conslet(text)
print(text)
->
im getting hungry
Return Type :String
- Example 2
text = '=,= im getting hungry~'
text = bt.conslet(text, sep = '~')
print(text)
->
im~getting~hungry
Return Type :String
A.2. Converting Text to Tokenized Consecutive Letter
def conslet_tokenized(val, sep=' ')
- Example 1
text = 'John Mayer, Honne, Minami, Lisa'
text = bt.conslet_tokenized(text)
print(text)
->
['John', 'Mayer', 'Honne', 'Minami', 'Lisa']
Return Type :List
- Example 2
text = 'John Mayer, Honne, Minami, Lisa'
text = bt.conslet_tokenized(text, sep = ',')
print(text)
->
['John Mayer', 'Honne', 'Minami', 'Lisa']
Return Type :List
A.3. Converting Text to Consecutive Number
def consnum(val, sep='')
- Example 1
text = '+62.81231.1231.123. This is random phone numbers ! -999-'
text = bt.consnum(text)
print(text)
->
62812311231123999
Return Type :String
- Example 2
text = '+62.81231.1231.123. This is random phone numbers ! -999-'
text = bt.consnum(text, sep = '-')
print(text)
->
62-81231-1231-123-999
Return Type :String
A.4. Converting Text to Tokenized Consecutive Number
def consnum_tokenized(val)
- Example 1
text = '+62.81231.1231.123. This is random phone numbers ! -999-'
text = bt.consnum_tokenized(text)
print(text)
->
['62', '81231', '1231', '123', '999']
Return Type :List
- Example 2
text = '+62-81231-1231-123. This is random phone numbers ! ~~'
text = bt.consnum(text, sep = '-')
print(text)
->
62-81231-1231-123
Return Type :String
A.5. Converting Text to Consecutive Punctuation
def conspunc(val, sep = '') :
- Example 1
text = 'Nyummy.... !!!! this is the best pancake ever :))))'
output = bt.conspunc(text)
print(output)
->
....!!!!:))))
Return Type :String
- Example 2
text = 'Nyummy.... !!!! this is the best pancake ever :))))'
output = bt.conspunc(text, sep = ' ')
print(output)
->
. . . . ! ! ! ! : ) ) ) )
Return Type :String
A.6. Converting Text to Tokenized Consecutive Punctuation
def conspunc_tokenized(val) :
- Example
text = 'Nyummy.... !!!! this is the best pancake ever :))))'
output = bt.conspunc_tokenized(text)
print(output)
->
['.', '.', '.', '.', '!', '!', '!', '!', ':', ')', ')', ')', ')']
Return Type :List
A.7. Converting Text to Lower Case
def lower(val) :
- Example
text = 'HeloOo WORLd !'
output = bt.lower(text)
print(output)
->
helooo world !
Return Type :String
A.8. Removing Spaces
def remove_spaces(val) :
- Example
text = 'Hel lo Wor ld'
output = bt.remove_spaces(text)
print(output)
->
HelloWorld
Return Type :String
A.9. Removing Double Spaces
def remove_double_spaces(val) :
- Example
text = 'Hello World from Universe !'
output = bt.remove_double_spaces(text)
print(output)
->
Hello World from Universe !
Return Type :String
A.10. Removing Char by User Option
def removeby_char(val, exclude, sep = '') :
- Example 1
text = 'i dont like math, i dont like wasabi'
output = bt.removeby_char(text, exclude = 'dont')
print(output)
->
i like math, i like wasabi
Return Type :String
- Example 2
text = 'i dont like math, i dont like wasabi'
output = bt.removeby_char(text, exclude = 'dont', sep = 'didnt')
print(output)
->
i didnt like math, i didnt like wasabi
Return Type :String
- Example 3
text = 'i dont like math, i dont like wasabi'
output = bt.removeby_char(text, exclude = ['i dont', 'like'])
print(output)
->
math, wasabi
Return Type :String
- Example 4
text = 'i dont like math, i dont like wasabi'
output = bt.removeby_char(text, exclude = ['i dont', 'like'], sep = '~')
print(output)
->
~ ~ math, ~ ~ wasabi
Return Type :String
A.11. Removing Char by User Desired Length
def removeby_length(val, exclude, sep = ' ') :
- Example 1
text = 'Hi hi hi welcome to the jungle'
output = bt.removeby_length(text, exclude = 2)
print(output)
->
welcome the jungle
Return Type :String
- Example 2
text = 'Hi hi hi welcome to the jungle'
output = bt.removeby_length(text, exclude = 2, sep = '~')
print(output)
->
welcome~the~jungle
Return Type :String
- Example 3
text = 'Hi hi hi welcome to the jungle'
output = bt.removeby_length(text, exclude = [2, 3])
print(output)
->
welcome jungle
Return Type :String
- Example 4
text = 'Hi hi hi welcome to the jungle'
output = bt.removeby_length(text, exclude = [2, 3], sep = ' HEI ')
print(output)
->
welcome HEI jungle
Return Type :String
A.12. Get All Valid Alphabet
def getall_alphabet(sep = '', include_upper = False) :
- Example 1
output = bt.getall_alphabet()
print(output)
->
abcdefghijklmnopqrstuvwxyz
Return Type :String
- Example 2
output = bt.getall_alphabet(sep = ' ')
print(output)
->
a b c d e f g h i j k l m n o p q r s t u v w x y z
Return Type :String
- Example 3
output = bt.getall_alphabet(include_upper = True)
print(output)
->
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
Return Type :String
- Example 4
output = bt.getall_alphabet(sep = '-', include_upper = True)
print(output)
->
a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Return Type :String
A.13. Get Tokenized All Valid Alphabet
def getall_alphabet_tokenized(include_upper = False) :
- Example 1
output = bt.getall_alphabet_tokenized()
print(output)
->
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
Return Type :List
- Example 2
output = bt.getall_alphabet_tokenized(include_upper = True)
print(output)
->
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
Return Type :List
A.14. Get All Valid Number
def getall_number(sep = '') :
- Example 1
output = bt.getall_number()
print(output)
->
0123456789
Return Type :String
- Example 2
output = bt.getall_number(sep = ' ')
print(output)
->
0 1 2 3 4 5 6 7 8 9
Return Type :String
A.15. Get Tokenized All Valid Number
def getall_number_tokenized() :
- Example
output = bt.getall_number_tokenized()
print(output)
->
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
Return Type :List
A.16. Get All Valid Punctuation
def getall_punc(sep = '') :
- Example 1
output = bt.getall_punc()
print(output)
->
!"#$%&'()*+,-./:;<=>?@[\]^_{|}~
Return Type :String
- Example 2
output = bt.getall_punc(sep = ' ')
print(output)
->
! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ { | } ~
Return Type :String
A.17. Get Tokenized All Valid Punctuation
def getall_punc_tokenized() :
- Example
output = bt.getall_punc_tokenized()
print(output)
->
['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '@', '[', '\\', ']', '^', '_', '{', '|', '}', '~']
Return Type :List
A.18. Normalizing a Text or a Collections
- Converting Text to Consecutive Letter
- Converting Text to Lower Case
- Removing Double Spaces
def normalize(obj, show_process = False) :
- Example 1
text = 'Nyummy8888888888888888 3235.... !!!! this is the best pancake ever :))))'
output = bt.normalize(text)
print(output)
->
nyummy this is the best pancake ever
Return Type :String
- Example 2
my_list = ['UwU this is so good :3', 'LETS GOO MAN !', 'okay you fine ! :3']
output = bt.normalize(my_list)
print(output)
->
['uwu this is so good', 'lets goo man', 'okay you fine']
Return Type :List
- Example 3
my_list = ['UwU this is so good :3', 'LETS GOO MAN !', 'okay you fine ! :3']
output = bt.normalize(my_list, show_process = True)
print(output)
Normalizing Data: [####################] 100.0% | P: 3 / 3 [ Done ]
->['uwu this is so good', 'lets goo man', 'okay you fine']
Return Type :List
A.19. Converting Object to String
def to_string(obj) :
- Example 1
number = 125.12525215
output = bt.to_string(number)
print(output)
->
125.12525215
Return Type :String
- Example 2
my_list = ['UwU this is so good :3', 'LETS GOO MAN !', 'okay you fine ! :3']
output = bt.to_string(my_list)
print(output)
->
UwU this is so good :3 LETS GOO MAN ! okay you fine ! :3
Return Type :String
Credit
- Main Github Page : https://github.com/haryobagas/
- Linkedin : https://www.linkedin.com/in/haryobagas08/
Other documentation work in progress.
Bear Au Jus - ジュースとくま @2020
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file btext-1.0.tar.gz
.
File metadata
- Download URL: btext-1.0.tar.gz
- Upload date:
- Size: 9.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d360803a0c9cdfc67b719017135caf6465061476e9a96b6aad4b4a39d4517170 |
|
MD5 | 24bfdce5c50ae0101a832e16f0f79aa7 |
|
BLAKE2b-256 | ca44da2b4d3cca9474fd1300ec903b6605dd5c6d31054ce772e2ce381010f2ff |
File details
Details for the file btext-1.0-py3-none-any.whl
.
File metadata
- Download URL: btext-1.0-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | afb7fca262cc3ad524a55ac10d83101696e2aa62a8fb462f89e037915bad32c6 |
|
MD5 | 8ca8f562f9f066b91e97e13a99f47534 |
|
BLAKE2b-256 | 862c7f4bb8244ca70beded431b70ae428d078988f18b7478ef3dc1aa19751d6d |