Skip to main content

This is a text preprocessing package

Project description

Text Preprocessing Python Package

Course Link: Introduction to NLP

This Python package is created by KGPTalkie. It provides various text preprocessing utilities for natural language processing (NLP) tasks.

Installation from PyPi

You can install this package using pip as follows:

pip install preprocess_kgptalkie

Installation from GitHub

You can install this package from GitHub as follows:

pip install git+https://github.com/laxmimerit/preprocess_kgptalkie.git --upgrade --force-reinstall

Uninstall the Package

To uninstall the package, use the following command:

pip uninstall preprocess_kgptalkie

Requirements

You need to install these python packages.

pip install spacy==3.7.6
python -m spacy download en_core_web_sm==3.7.1
pip install nltk==3.9.1
pip install beautifulsoup4==3.2.2
pip install textblob==0.18.0.post0

Download NLTK Data

If you are using this package first time then You need to download NLTK data as follows:

import preprocess_kgptalkie as ps
ps.download_nltk_data()

How to Use the Package

1. Basic Text Preprocessing

Lowercasing Text

import preprocess_kgptalkie as ps

text = "HELLO WORLD!"
processed_text = ps.to_lower_case(text)
print(processed_text)  # Output: hello world!

Expanding Contractions

import preprocess_kgptalkie as ps

text = "I'm learning NLP."
processed_text = ps.contraction_to_expansion(text)
print(processed_text)  # Output: I am learning NLP.

Removing Emails

import preprocess_kgptalkie as ps

text = "Contact me at example@example.com"
processed_text = ps.remove_emails(text)
print(processed_text)  # Output: Contact me at 

Removing URLs

import preprocess_kgptalkie as ps

text = "Check out https://example.com"
processed_text = ps.remove_urls(text)
print(processed_text)  # Output: Check out

Removing HTML Tags

import preprocess_kgptalkie as ps

text = "<p>Hello World!</p>"
processed_text = ps.remove_html_tags(text)
print(processed_text)  # Output: Hello World!

Removing Special Characters

import preprocess_kgptalkie as ps

text = "Hello @World! #NLP"
processed_text = ps.remove_special_chars(text)
print(processed_text)  # Output: Hello World NLP

2. Advanced Text Processing

Lemmatization

import preprocess_kgptalkie as ps

text = "running runs"
processed_text = ps.lemmatize(text)
print(processed_text)  # Output: run run

Sentiment Analysis

import preprocess_kgptalkie as ps

text = "I love programming!"
sentiment = ps.sentiment_analysis(text)
print(sentiment)  # Output: Sentiment(polarity=0.5, subjectivity=0.6)

Detecting and Translating Language

import preprocess_kgptalkie as ps
from googletrans import Translator

translator = Translator()
text = "Bonjour tout le monde"
lang = ps.detect_language(text, translator)
translated_text = ps.translate(text, 'en', translator)
print(f"Language: {lang}, Translated: {translated_text}")
# Output: Language: fr, Translated: Hello everyone

3. Feature Extraction

Word Count

import preprocess_kgptalkie as ps

text = "I love NLP."
count = ps.word_count(text)
print(count)  # Output: 3

Character Count

import preprocess_kgptalkie as ps

text = "I love NLP."
count = ps.char_count(text)
print(count)  # Output: 9

N-Grams

import preprocess_kgptalkie as ps

text = "I love NLP"
ngrams = ps.n_grams(text, n=2)
print(ngrams)  # Output: [('I', 'love'), ('love', 'NLP')]

4. Full Example: Cleaning Text

Here’s an example of how you might use several functions together to clean text data:

import preprocess_kgptalkie as ps

text = "I'm loving this NLP tutorial! Contact me at udemy@kgptalkie.com. Visit https://kgptalkie.com."
cleaned_text = ps.clean_text(text)
print(cleaned_text)
# Output: i am loving this nlp tutorial contact me at visit

One Short Feature Extraction

import preprocess_kgptalkie as ps

ps.extract_features("I love NLP")

Notes

  • Be cautious when using heavy operations like lemmatize and spelling_correction on very large datasets, as they can be time-consuming.
  • The package supports custom cleaning and preprocessing pipelines by using these modular functions together.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

preprocess_kgptalkie-0.11.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

preprocess_kgptalkie-0.11-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file preprocess_kgptalkie-0.11.tar.gz.

File metadata

  • Download URL: preprocess_kgptalkie-0.11.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for preprocess_kgptalkie-0.11.tar.gz
Algorithm Hash digest
SHA256 692a9953e61ca31b1beaccd0cd9863b36085203afa634e4436cff5a74994cd62
MD5 89bb0beb09447fccf4620ed84881e599
BLAKE2b-256 b27ddee1b3dca27329f7f13fc338e749d56c3551a7dbd3a29164099402e69769

See more details on using hashes here.

File details

Details for the file preprocess_kgptalkie-0.11-py3-none-any.whl.

File metadata

File hashes

Hashes for preprocess_kgptalkie-0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 cfd36ebd16e05cd32d37f1b45ca283f36c47719a4c4d39aa90e6b9388717471d
MD5 44f80cbe9282c43c68b5622251b29ace
BLAKE2b-256 272523bf069caaa46df2acb4f4434b116446bbb20dca8ef97828fdc364dbc24a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page