Skip to main content

This is a text preprocessing package

Project description

Text Preprocessing Python Package

Course Link: Introduction to NLP

This Python package is created by KGPTalkie. It provides various text preprocessing utilities for natural language processing (NLP) tasks.

Installation from PyPi

You can install this package using pip as follows:

pip install preprocess_kgptalkie

Installation from GitHub

You can install this package from GitHub as follows:

pip install git+https://github.com/laxmimerit/preprocess_kgptalkie.git --upgrade --force-reinstall

Uninstall the Package

To uninstall the package, use the following command:

pip uninstall preprocess_kgptalkie

Requirements

You need to install these python packages.

pip install spacy==3.7.6
python -m spacy download en_core_web_sm==3.7.1
pip install nltk==3.9.1
pip install beautifulsoup4==3.2.2
pip install textblob==0.18.0.post0

Download NLTK Data

If you are using this package first time then You need to download NLTK data as follows:

import preprocess_kgptalkie as ps
ps.download_nltk_data()

How to Use the Package

1. Basic Text Preprocessing

Lowercasing Text

import preprocess_kgptalkie as ps

text = "HELLO WORLD!"
processed_text = ps.to_lower_case(text)
print(processed_text)  # Output: hello world!

Expanding Contractions

import preprocess_kgptalkie as ps

text = "I'm learning NLP."
processed_text = ps.contraction_to_expansion(text)
print(processed_text)  # Output: I am learning NLP.

Removing Emails

import preprocess_kgptalkie as ps

text = "Contact me at example@example.com"
processed_text = ps.remove_emails(text)
print(processed_text)  # Output: Contact me at 

Removing URLs

import preprocess_kgptalkie as ps

text = "Check out https://example.com"
processed_text = ps.remove_urls(text)
print(processed_text)  # Output: Check out

Removing HTML Tags

import preprocess_kgptalkie as ps

text = "<p>Hello World!</p>"
processed_text = ps.remove_html_tags(text)
print(processed_text)  # Output: Hello World!

Removing Special Characters

import preprocess_kgptalkie as ps

text = "Hello @World! #NLP"
processed_text = ps.remove_special_chars(text)
print(processed_text)  # Output: Hello World NLP

2. Advanced Text Processing

Lemmatization

import preprocess_kgptalkie as ps

text = "running runs"
processed_text = ps.lemmatize(text)
print(processed_text)  # Output: run run

Sentiment Analysis

import preprocess_kgptalkie as ps

text = "I love programming!"
sentiment = ps.sentiment_analysis(text)
print(sentiment)  # Output: Sentiment(polarity=0.5, subjectivity=0.6)

Detecting and Translating Language

import preprocess_kgptalkie as ps
from googletrans import Translator

translator = Translator()
text = "Bonjour tout le monde"
lang = ps.detect_language(text, translator)
translated_text = ps.translate(text, 'en', translator)
print(f"Language: {lang}, Translated: {translated_text}")
# Output: Language: fr, Translated: Hello everyone

3. Feature Extraction

Word Count

import preprocess_kgptalkie as ps

text = "I love NLP."
count = ps.word_count(text)
print(count)  # Output: 3

Character Count

import preprocess_kgptalkie as ps

text = "I love NLP."
count = ps.char_count(text)
print(count)  # Output: 9

N-Grams

import preprocess_kgptalkie as ps

text = "I love NLP"
ngrams = ps.n_grams(text, n=2)
print(ngrams)  # Output: [('I', 'love'), ('love', 'NLP')]

4. Full Example: Cleaning Text

Here’s an example of how you might use several functions together to clean text data:

import preprocess_kgptalkie as ps

text = "I'm loving this NLP tutorial! Contact me at udemy@kgptalkie.com. Visit https://kgptalkie.com."
cleaned_text = ps.clean_text(text)
print(cleaned_text)
# Output: i am loving this nlp tutorial contact me at visit

One Short Feature Extraction

import preprocess_kgptalkie as ps

ps.extract_features("I love NLP")

Notes

  • Be cautious when using heavy operations like lemmatize and spelling_correction on very large datasets, as they can be time-consuming.
  • The package supports custom cleaning and preprocessing pipelines by using these modular functions together.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

preprocess_kgptalkie-0.1.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

preprocess_kgptalkie-0.1-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file preprocess_kgptalkie-0.1.tar.gz.

File metadata

  • Download URL: preprocess_kgptalkie-0.1.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for preprocess_kgptalkie-0.1.tar.gz
Algorithm Hash digest
SHA256 c92dca9ea083817f6d3e41b351ec50355ff49e5426cc00f8e31688e78290532d
MD5 b7d9087f960f5c9c324af66ca7d73d88
BLAKE2b-256 bf0f54771fdd43582d2fefd3345ae159ffdf823cc4cafd45b4b4a7368c8c25c1

See more details on using hashes here.

File details

Details for the file preprocess_kgptalkie-0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for preprocess_kgptalkie-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a97122c292668674539384b96d810ada28387538ccbf984c30277255ee9d17d3
MD5 719eae35bf687d60aa6e51010afe9be5
BLAKE2b-256 0d78a84fefbad6e29990598b68ceb27ea66b8322e057828042f5e0c4f0e53bbd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page