MusaddiqueHussainLabs: Empowering text analytics with advanced tools for comprehensive Natural Language Processing (NLP) and Language Models (LLMs).
Project description
MusaddiqueHussainLabs NLP: State-of-the-Art Natural Language Processing & LLMs Library
MusaddiqueHussainLabs is a comprehensive Natural Language Processing (NLP) library designed to offer state-of-the-art functionality for various NLP tasks. This Python package provides a range of tools and functionalities aimed at facilitating NLP tasks, document analysis, and text preprocessing.
Features
Currently the package is organized into three primary modules:
1. NLP Components
Component Type | Description |
---|---|
tokenize | Text tokenization |
pos | Part-of-Speech tagging |
lemma | Word lemmatization |
morphology | Study of word forms |
dep | Dependency parsing |
ner | Named Entity Recognition |
norm | Text normalization |
2. Text Preprocessing
This module equips users with an extensive set of text preprocessing tools:
Function | Description |
---|---|
to_lower | Convert text to lowercase |
to_upper | Convert text to uppercase |
remove_number | Remove numerical characters |
remove_itemized_bullet_and_numbering | Eliminate itemized/bullet-point numbering |
remove_url | Remove URLs from text |
remove_punctuation | Remove punctuation marks |
remove_special_character | Remove special characters |
keep_alpha_numeric | Keep only alphanumeric characters |
remove_whitespace | Remove excess whitespace |
normalize_unicode | Normalize Unicode characters |
remove_stopword | Eliminate common stopwords |
remove_freqwords | Remove frequently occurring words |
remove_rarewords | Remove rare words |
remove_email | Remove email addresses |
remove_phone_number | Remove phone numbers |
remove_ssn | Remove Social Security Numbers (SSN) |
remove_credit_card_number | Remove credit card numbers |
remove_emoji | Remove emojis |
remove_emoticons | Remove emoticons |
convert_emoticons_to_words | Convert emoticons to words |
convert_emojis_to_words | Convert emojis to words |
remove_html | Remove HTML tags |
chat_words_conversion | Convert chat language to standard English |
expand_contraction | Expand contractions (e.g., "can't" to "cannot") |
tokenize_word | Tokenize words |
tokenize_sentence | Tokenize sentences |
stem_word | Stem words |
lemmatize_word | Lemmatize words |
preprocess_text | Combine multiple preprocessing steps into one function |
3. Document Analysis
Functionality | Description |
---|---|
Language | Detect document language |
Linguistic Analysis | Resolve ambiguities |
Key phrases | Retrieve relevant information from documents |
NER | Named Entity Recognition |
Sentiment | Analyze sentiment of text |
PII Anonymization | Anonymize Personally Identifiable Information |
Prerequisites
- Python >= 3.9
- GOOGLE_API_KEY from Google AI Studio
- Place the API key in a
.env
file in the project root directory.
Installation
To install musaddiquehussainlabs
, you can use pip
:
pip install musaddiquehussainlabs
Usage
from musaddiquehussainlabs.nlp_components import nlp
from musaddiquehussainlabs.text_preprocessing import preprocess_text, to_lower, remove_email, remove_url, remove_punctuation, lemmatize_word
from musaddiquehussainlabs.document_analysis import DocumentAnalysis
data_to_process = "The employee's SSN is 859-98-0987. The employee's phone number is 555-555-5555."
# Using NLP component
result = nlp.predict(component_type="ner", input_text=data_to_process)
print(result)
# Text preprocessing
preprocessed_text = preprocess_text(data_to_process)
print(preprocessed_text)
# Custom Text preprocessing
preprocess_functions = [to_lower, remove_email, remove_url, remove_punctuation, lemmatize_word]
preprocessed_text = preprocess_text(data_to_process, preprocess_functions)
print(preprocessed_text)
# Document analysis
document_analysis = DocumentAnalysis()
# Option 1: full analysis
result = document_analysis.full_analysis(data_to_process)
# Option 2: Individual document analysis
result = document_analysis.pii_anonymization(data_to_process)
print(result)
Feel free to explore more functionalities and customize the usage based on your requirements!
For detailed usage examples and API documentation, please refer to the documentation (docs link comming soon) available.
Upcoming Features
We're continuously working on expanding MusaddiqueHussainLabs to provide even more capabilities for NLP tasks. Please stay tuned for these exciting enhancements!
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file musaddiquehussainlabs-0.0.1.tar.gz
.
File metadata
- Download URL: musaddiquehussainlabs-0.0.1.tar.gz
- Upload date:
- Size: 47.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6ef7f22c90e633e94b25aed25e96eee89c6a8422e374ec4b09d120d43ed0a69 |
|
MD5 | 24aaf08f4f7c298d6ed505b7d652e97f |
|
BLAKE2b-256 | 44fc1b4e15d80c7d00a33c7eb5adfcd027ebb85c18eca332a96d513119278675 |
File details
Details for the file musaddiquehussainlabs-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: musaddiquehussainlabs-0.0.1-py3-none-any.whl
- Upload date:
- Size: 49.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6aa9cf91a1845d46109c70b36148e06fdba781012919e3850de230f1a6e93c49 |
|
MD5 | 2c9bc1f248469f71c674f1bc80c1ad9f |
|
BLAKE2b-256 | 6d58305edddfff50e246e9448db420e7d280a4772bbef17be7ad91cca85a79fd |