crazytext

An Easy To Use Text Cleaning Package For NLP

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

crazytext

crazytext: An Easy To Use Text Cleaning Package For NLP Built In Python

Some Times Text Can Become Very Crazy That The Content You Want and Really Useful Become Very Hard To Extract. crazytext is here to help you. It offers one line code snippets to clean and analyze your text faster than you.

why do the hard work when there is an option for smart work- Creator crazytext

Dependencies

pip install pandas
pip install numpy
pip install textblob
pip install sklearn
pip install lxml
pip install nltk

Installation

pip install crazytext

Text Analysis Using crazytext

sample_text = 'AI is the future of HUMAN KIND, & Trendiest Topic of Today. #ai #future @aiforfuture https://ai.com  (555) 555-1234  <p> Mobile Number </p> (555) 345-1234  <span>Pincode:</span> 224 '

Let's Import Our Library

import crazytext as ct

Quick Analysis

doc = ct.Counter(text=sample_text)
doc.info()
>>
Length of String: 153
Number of URLs: 1
Number of Emails: 0
Number of Words: 25
Average Word Count: 6.12
Number of Stopwords: 4
Total Hashtags: 2
Total Mentions: 1
Total Length of Numeric Data: 7
Special Characters: 154
White Spaces: 28
Number of Vowels: 38
Number of Consonants: 143
Total Uppercase Words 3
Number of Phone Number Inside Text: 2
Observed Sentiment: (0.15, 'Positive')

Step By Step Analysis

doc.count_words()
>> 25

doc.count_stopwords()
>> 4

doc.count_phone_numbers()
>> 2

doc.count_uppercase_words()
>> 3

You Can Try Many More Methods Just Type doc.count and press tab to get all the available Counter Methods.

Note : All The Methods For Counter Class Starts With count_

Text Extraction Using crazytext

sample_text = 'AI is the future of HUMAN KIND, & Trendiest Topic of Today. #ai #future @aiforfuture www.ai.com (555) 555-1234  xyz@gmail.com <p> Mobile Number </p> (555) 345-1234  <span>Pincode:</span> 224 '

Let's Import Our Library

import crazytext as ct
extractor = ct.Extractor(text=sample_text)

Extracting Emails

extractor.get_emails()
>>['xyz@gmail.com']

Extracting Phone Numbers

extractor.get_phone_numbers()
['(555) 555-1234', '(555) 345-1234']

Extracting UPPER CASE words

extractor.get_uppercase_words()
>>['AI', 'HUMAN', 'KIND,']

Extracting Hashtags

extractor.get_hashtags()
>>['#ai', '#future']

Extracting Mentions

extractor.get_mentions()
>>['@aiforfuture']

Extracting HTML Tags

extractor.get_html_tags()
>>['<p>', '</p>', '<span>', '</span>']

Try Other Interesting Methods By Installing The Library Using pip install crazytext.

Note : All The Methods For Extractor Class Starts With get_

Text Cleaning Using crazytext

There Are Two Ways To Clean The Text

Remove Text Completly.
Replace The Text With Its Saying

1. Remove Text Completly.

sample_text = '<h1>The Dark ó Knight</h1> a batman ó movie @batman ó #batman https://batman.com (555) 555-1234 ó 21 22 óó ó'

Let's Import Our Library

import crazytext as ct
cleaner = ct.Cleaner(text=sample_text)

Removing HTML Tags

cleaner.remove_html_tags_c()
>>' The Dark ó Knight a batman ó movie @batman ó #batman https://batman.com (555) 555-1234 ó 21 22 óó ó'

Removing Phone Numbers

cleaner.remove_phone_numbers_c()
>> 'a batman ó movie @batman ó #batman https://batman.com  ó 21 22 óó ó'

2. Replace The Text With Its Saying Replacing HTML Tags

cleaner.remove_html_tags()
>>'HtmlTag The Dark ó Knight a batman ó movie @batman ó #batman https://batman.com (555) 555-1234 ó 21 22 óó ó'

Replaxcing Phone Number

cleaner.remove_phone_numbers()
>> 'The Dark ó Knight</h1> a batman ó movie @batman ó #batman https://batman.com PhoneNumber ó 21 22 óó ó'

Quick Cleaning of A Document

To Clean A Doucment Quickly You Can Use quickclean() method inside Cleaner class.

Quick Clean

import crazytext as ct
ct = Cleaner(text=sample_text)
ct.quickclean(remove_complete=True,make_base=False)
>>'the dark knight batman movie batman batman'

You Can Further Remove Duplicates Using The remove_duplicate_words() method.

Working With Dataframes Using crazytext

Let's Load Hotel Reviews Dataframe From My Github.

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/Abhayparashar31/NLPP_sentiment-analsis-on-hotel-review/main/Restaurant_Reviews.tsv',delimiter = "\t",quoting=3)

Let's Import Our Library and Creat A Object For Our Class Dataframe

import crazytext as ct
dc = ct.Dataframe(df=df,col='Review')

Let's Find Our Dataframe Column Word Frequency Count Using crazytext

dc.get_df_words_frequency_count()
>>
the             405
and             378
I               294
was             292
a               228
               ... 
Seat              1
dirty-            1
gross.            1
unbelievably      1
check.            1
Length: 2967, dtype: int64

Cleaning The Dataframe Using One Line of Code With The Help of pretty text

df['cleaned_reviews'] = dc.clean(remove_complete=True,make_base='lemmatization')
df['cleaned_reviews']
>>
0                                        wow loved place
1                                         crust not good
2                                not tasty texture nasty
3      stopped late may bank holiday rick steve recom...
4                         the selection menu great price
                    ....

Next, Let's Convert This Cleaned Text Into Vectors For Further Processing

vector = ct.Dataframe(df=df,col='cleaned_reviews')
vector.to_tfidf(max_features=3500)
>>
array([[0.        , 0.        , 0.        , 1.        , 0.        ],
       [0.        , 0.72888336, 0.6846379 , 0.        , 0.        ],
       [0.        , 0.        , 1.        , 0.        , 0.        ],
       ...,
       [0.        , 0.        , 1.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        , 0.        , 1.        ],
       [0.        , 0.        , 0.        , 0.        , 0.        ]])

Project : Sentiment Analysis On Hotel Reviews

Let's Build A Model For Classifying different reviews into two different categories positive and negative using our library crazytext.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix,accuracy_score
from sklearn.naive_bayes import MultinomialNB

dataset = pd.read_csv('https://raw.githubusercontent.com/Abhayparashar31/NLPP_sentiment-analsis-on-hotel-review/main/Restaurant_Reviews.tsv',delimiter = "\t",quoting=3)
doc = ct.Dataframe(df=dataset,col='Review')
corpus = doc.clean(remove_complete=True,make_base='lemmatization')  ## Cleaning
X,cv = ct.to_cv(corpus,max_features=3500)                           ## Vectorization
y = dataset['Liked']
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=0)
cls = MultinomialNB().fit(X_train, y_train)
y_pred = cls.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
score = accuracy_score(y_test,y_pred)
print(cm,score*100)
#print(np.concatenate((y_pred.reshape(len(y_pred),1), np.array(y_test).reshape(len(y_test),1)),1))

>>>[[78 19]
 [21 82]] 80.0

We Received An Accuracy of 80% using our library. Let's use this model to predict some new reviews.

new_review = str(input("Enter new review..."))
cleaner = ct.Cleaner(text=new_review)
cleaned_review = cleaner.quick_clean(remove_complete=True,make_base='lemmatization')
new_x = cv.transform([cleaned_review]).toarray()
predictions = cls.predict(new_x)
if predictions[0]==1: print('Positive 😀')
else: print("Negative 😞")

>>> Enter new review...worst food and experience
Negative 😞

FUTURE WORK

More NLP Tasks To Be Added.
Inbuilt Model Support To Be Added.

Uninstall

We Are Unhappy To See You Go, You Can Give Your Feedback By Putting A Comment On The Repo.

pip uninstall crazytext

Contributor

Abhay Parashar.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.0.4

Apr 2, 2022

1.0.3

Apr 2, 2022

1.0.2

Apr 2, 2022

1.0.1

Apr 2, 2022

1.0.0

Apr 2, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crazytext-1.0.4.tar.gz (17.6 kB view details)

Uploaded Apr 2, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crazytext-1.0.4-py3-none-any.whl (14.8 kB view details)

Uploaded Apr 2, 2022 Python 3

File details

Details for the file crazytext-1.0.4.tar.gz.

File metadata

Download URL: crazytext-1.0.4.tar.gz
Upload date: Apr 2, 2022
Size: 17.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.24.0 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.62.3 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.7.9

File hashes

Hashes for crazytext-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`a9487b93d3c586b2a00433f07a9151bb629cb89cd77d8bd28d0eb79397b598f8`
MD5	`fa3a47391ebaeb009ad7284e25f5991d`
BLAKE2b-256	`386a8b1ff5653b298fd787d01ccb6f9bba43ef9825945e292f00eb1b02f3c02b`

See more details on using hashes here.

File details

Details for the file crazytext-1.0.4-py3-none-any.whl.

File metadata

Download URL: crazytext-1.0.4-py3-none-any.whl
Upload date: Apr 2, 2022
Size: 14.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.24.0 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.62.3 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.3 CPython/3.7.9

File hashes

Hashes for crazytext-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`48e31d446aec42919ad16f71f6bbdc9094db69daf73e1e332192153ee4b9f3a7`
MD5	`e683e3dad91e8345f9323eda537870f3`
BLAKE2b-256	`b12bf90d922b9ecd3dc0d79ff75667da823072646524d544f627cad271220e72`

See more details on using hashes here.

crazytext 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

crazytext

Dependencies

Installation

Text Analysis Using crazytext

Text Extraction Using crazytext

Text Cleaning Using crazytext

Quick Cleaning of A Document

Working With Dataframes Using crazytext

Project : Sentiment Analysis On Hotel Reviews

FUTURE WORK

Uninstall

Contributor

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes