Skip to main content

Bear Au Jus Text (btext) is a tool used for processing a text/string, optimized for data science and data analytics. btext can also implemented with pandas dataframe.

Project description

BTEXT

Bear Au Jus Text (btext) is a tool used for processing a text/string, optimized for data science and data analytics.
btext can also implemented with pandas dataframe.

Latest Changelog

Release Date : 12/23/2020 Version 1.0

  • Initial Commit

Installation

Get latest version of bearaujus

pip install btext --upgrade

Documentation

A. Core Module

Core module contains base functions of base text processing from bearaujus.

import btext as bt

Core Module : Table of Contens

  • A.1. Converting Text to Consecutive Letter
  • A.2. Converting Text to Tokenized Consecutive Letter
  • A.3. Converting Text to Consecutive Number
  • A.4. Converting Text to Tokenized Consecutive Number
  • A.5. Converting Text to Consecutive Punctuation
  • A.6. Converting Text to Tokenized Consecutive Punctuation
  • A.7. Converting Text to Lower Case
  • A.8. Removing Spaces
  • A.9. Removing Double Spaces
  • A.10. Removing Char by User Option
  • A.11. Removing Char by User Desired Length
  • A.12. Get All Valid Alphabet
  • A.13. Get Tokenized All Valid Alphabet
  • A.14. Get All Valid Number
  • A.15. Get Tokenized All Valid Number
  • A.16. Get All Valid Punctuation
  • A.17. Get Tokenized All Valid Punctuation
  • A.18. Normalizing a Text or a Collections
  • A.19. Converting Object to String

A.1. Converting Text to Consecutive Letter

def conslet(val, sep=' ')
  • Example 1
text = '=,= im getting hungry~'
text = bt.conslet(text)
print(text)

-> im getting hungry
Return Type : String

  • Example 2
text = '=,= im getting hungry~'
text = bt.conslet(text, sep = '~')
print(text)

-> im~getting~hungry
Return Type : String

A.2. Converting Text to Tokenized Consecutive Letter

def conslet_tokenized(val, sep=' ')
  • Example 1
text = 'John Mayer, Honne, Minami, Lisa'
text = bt.conslet_tokenized(text)
print(text)

-> ['John', 'Mayer', 'Honne', 'Minami', 'Lisa']
Return Type : List

  • Example 2
text = 'John Mayer, Honne, Minami, Lisa'
text = bt.conslet_tokenized(text, sep = ',')
print(text)

-> ['John Mayer', 'Honne', 'Minami', 'Lisa']
Return Type : List

A.3. Converting Text to Consecutive Number

def consnum(val, sep='')
  • Example 1
text = '+62.81231.1231.123. This is random phone numbers ! -999-'
text = bt.consnum(text)
print(text)

-> 62812311231123999
Return Type : String

  • Example 2
text = '+62.81231.1231.123. This is random phone numbers ! -999-'
text = bt.consnum(text, sep = '-')
print(text)

-> 62-81231-1231-123-999
Return Type : String

A.4. Converting Text to Tokenized Consecutive Number

def consnum_tokenized(val)
  • Example 1
text = '+62.81231.1231.123. This is random phone numbers ! -999-'
text = bt.consnum_tokenized(text)
print(text)

-> ['62', '81231', '1231', '123', '999']
Return Type : List

  • Example 2
text = '+62-81231-1231-123. This is random phone numbers ! ~~'
text = bt.consnum(text, sep = '-')
print(text)

-> 62-81231-1231-123
Return Type : String

A.5. Converting Text to Consecutive Punctuation

def conspunc(val, sep = '') :
  • Example 1
text = 'Nyummy.... !!!! this is the best pancake ever :))))'
output = bt.conspunc(text)
print(output)

-> ....!!!!:))))
Return Type : String

  • Example 2
text = 'Nyummy.... !!!! this is the best pancake ever :))))'
output = bt.conspunc(text, sep = ' ')
print(output)

-> . . . . ! ! ! ! : ) ) ) )
Return Type : String

A.6. Converting Text to Tokenized Consecutive Punctuation

def conspunc_tokenized(val) :
  • Example
text = 'Nyummy.... !!!! this is the best pancake ever :))))'
output = bt.conspunc_tokenized(text)
print(output)

-> ['.', '.', '.', '.', '!', '!', '!', '!', ':', ')', ')', ')', ')']
Return Type : List

A.7. Converting Text to Lower Case

def lower(val) :
  • Example
text = 'HeloOo WORLd !'
output = bt.lower(text)
print(output)

-> helooo world !
Return Type : String

A.8. Removing Spaces

def remove_spaces(val) :
  • Example
text = 'Hel     lo Wor     ld'
output = bt.remove_spaces(text)
print(output)

-> HelloWorld
Return Type : String

A.9. Removing Double Spaces

def remove_double_spaces(val) :
  • Example
text = 'Hello    World from        Universe !'
output = bt.remove_double_spaces(text)
print(output)

-> Hello World from Universe !
Return Type : String

A.10. Removing Char by User Option

def removeby_char(val, exclude, sep = '') :
  • Example 1
text = 'i dont like math, i dont like wasabi'
output = bt.removeby_char(text, exclude = 'dont')
print(output)

-> i like math, i like wasabi
Return Type : String

  • Example 2
text = 'i dont like math, i dont like wasabi'
output = bt.removeby_char(text, exclude = 'dont', sep = 'didnt')
print(output)

-> i didnt like math, i didnt like wasabi
Return Type : String

  • Example 3
text = 'i dont like math, i dont like wasabi'
output = bt.removeby_char(text, exclude = ['i dont', 'like'])
print(output)

-> math, wasabi
Return Type : String

  • Example 4
text = 'i dont like math, i dont like wasabi'
output = bt.removeby_char(text, exclude = ['i dont', 'like'], sep = '~')
print(output)

-> ~ ~ math, ~ ~ wasabi
Return Type : String

A.11. Removing Char by User Desired Length

def removeby_length(val, exclude, sep = ' ') :
  • Example 1
text = 'Hi hi hi welcome to the jungle'
output = bt.removeby_length(text, exclude = 2)
print(output)

-> welcome the jungle
Return Type : String

  • Example 2
text = 'Hi hi hi welcome to the jungle'
output = bt.removeby_length(text, exclude = 2, sep = '~')
print(output)

-> welcome~the~jungle
Return Type : String

  • Example 3
text = 'Hi hi hi welcome to the jungle'
output = bt.removeby_length(text, exclude = [2, 3])
print(output)

-> welcome jungle
Return Type : String

  • Example 4
text = 'Hi hi hi welcome to the jungle'
output = bt.removeby_length(text, exclude = [2, 3], sep = ' HEI ')
print(output)

-> welcome HEI jungle
Return Type : String

A.12. Get All Valid Alphabet

def getall_alphabet(sep = '', include_upper = False) :
  • Example 1
output = bt.getall_alphabet()
print(output)

-> abcdefghijklmnopqrstuvwxyz
Return Type : String

  • Example 2
output = bt.getall_alphabet(sep = ' ')
print(output)

-> a b c d e f g h i j k l m n o p q r s t u v w x y z
Return Type : String

  • Example 3
output = bt.getall_alphabet(include_upper = True)
print(output)

-> abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
Return Type : String

  • Example 4
output = bt.getall_alphabet(sep = '-', include_upper = True)
print(output)

-> a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z-A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
Return Type : String

A.13. Get Tokenized All Valid Alphabet

def getall_alphabet_tokenized(include_upper = False) :
  • Example 1
output = bt.getall_alphabet_tokenized()
print(output)

-> ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
Return Type : List

  • Example 2
output = bt.getall_alphabet_tokenized(include_upper = True)
print(output)

-> ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
Return Type : List

A.14. Get All Valid Number

def getall_number(sep = '') :
  • Example 1
output = bt.getall_number()
print(output)

-> 0123456789
Return Type : String

  • Example 2
output = bt.getall_number(sep = ' ')
print(output)

-> 0 1 2 3 4 5 6 7 8 9
Return Type : String

A.15. Get Tokenized All Valid Number

def getall_number_tokenized() :
  • Example
output = bt.getall_number_tokenized()
print(output)

-> ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
Return Type : List

A.16. Get All Valid Punctuation

def getall_punc(sep = '') :
  • Example 1
output = bt.getall_punc()
print(output)

-> !"#$%&'()*+,-./:;<=>?@[\]^_{|}~
Return Type : String

  • Example 2
output = bt.getall_punc(sep = ' ')
print(output)

-> ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ { | } ~
Return Type : String

A.17. Get Tokenized All Valid Punctuation

def getall_punc_tokenized() :
  • Example
output = bt.getall_punc_tokenized()
print(output)

-> ['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '@', '[', '\\', ']', '^', '_', '{', '|', '}', '~']
Return Type : List

A.18. Normalizing a Text or a Collections

  • Converting Text to Consecutive Letter
  • Converting Text to Lower Case
  • Removing Double Spaces
def normalize(obj, show_process = False) :
  • Example 1
text = 'Nyummy8888888888888888 3235.... !!!! this is the best pancake ever :))))'
output = bt.normalize(text)
print(output)

-> nyummy this is the best pancake ever
Return Type : String

  • Example 2
my_list = ['UwU this is so good :3', 'LETS GOO MAN !', 'okay you fine ! :3']
output = bt.normalize(my_list)
print(output)

-> ['uwu this is so good', 'lets goo man', 'okay you fine']
Return Type : List

  • Example 3
my_list = ['UwU this is so good :3', 'LETS GOO MAN !', 'okay you fine ! :3']
output = bt.normalize(my_list, show_process = True)
print(output)

Normalizing Data: [####################] 100.0% | P: 3 / 3 [ Done ]
-> ['uwu this is so good', 'lets goo man', 'okay you fine']
Return Type : List

A.19. Converting Object to String

def to_string(obj) :
  • Example 1
number = 125.12525215
output = bt.to_string(number)
print(output)

-> 125.12525215
Return Type : String

  • Example 2
my_list = ['UwU this is so good :3', 'LETS GOO MAN !', 'okay you fine ! :3']
output = bt.to_string(my_list)
print(output)

-> UwU this is so good :3 LETS GOO MAN ! okay you fine ! :3
Return Type : String

Credit

Other documentation work in progress.

Bear Au Jus - ジュースとくま @2020

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

btext-1.0.tar.gz (9.9 kB view hashes)

Uploaded Source

Built Distribution

btext-1.0-py3-none-any.whl (7.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page