Skip to main content

MusaddiqueHussainLabs: Empowering text analytics with advanced tools for comprehensive Natural Language Processing (NLP) and Language Models (LLMs).

Project description

MusaddiqueHussainLabs NLP: State-of-the-Art Natural Language Processing & LLMs Library

MusaddiqueHussainLabs is a comprehensive Natural Language Processing (NLP) library designed to offer state-of-the-art functionality for various NLP tasks. This Python package provides a range of tools and functionalities aimed at facilitating NLP tasks, document analysis, and text preprocessing.

Features

Currently the package is organized into three primary modules:

1. NLP Components

Component Type Description
tokenize Text tokenization
pos Part-of-Speech tagging
lemma Word lemmatization
morphology Study of word forms
dep Dependency parsing
ner Named Entity Recognition
norm Text normalization

2. Text Preprocessing

This module equips users with an extensive set of text preprocessing tools:

Function Description
to_lower Convert text to lowercase
to_upper Convert text to uppercase
remove_number Remove numerical characters
remove_itemized_bullet_and_numbering Eliminate itemized/bullet-point numbering
remove_url Remove URLs from text
remove_punctuation Remove punctuation marks
remove_special_character Remove special characters
keep_alpha_numeric Keep only alphanumeric characters
remove_whitespace Remove excess whitespace
normalize_unicode Normalize Unicode characters
remove_stopword Eliminate common stopwords
remove_freqwords Remove frequently occurring words
remove_rarewords Remove rare words
remove_email Remove email addresses
remove_phone_number Remove phone numbers
remove_ssn Remove Social Security Numbers (SSN)
remove_credit_card_number Remove credit card numbers
remove_emoji Remove emojis
remove_emoticons Remove emoticons
convert_emoticons_to_words Convert emoticons to words
convert_emojis_to_words Convert emojis to words
remove_html Remove HTML tags
chat_words_conversion Convert chat language to standard English
expand_contraction Expand contractions (e.g., "can't" to "cannot")
tokenize_word Tokenize words
tokenize_sentence Tokenize sentences
stem_word Stem words
lemmatize_word Lemmatize words
preprocess_text Combine multiple preprocessing steps into one function

3. Document Analysis

Functionality Description
Language Detect document language
Linguistic Analysis Resolve ambiguities
Key phrases Retrieve relevant information from documents
NER Named Entity Recognition
Sentiment Analyze sentiment of text
PII Anonymization Anonymize Personally Identifiable Information

Prerequisites

  • Python >= 3.9
  • GOOGLE_API_KEY from Google AI Studio
  • Place the API key in a .env file in the project root directory.

Installation

To install musaddiquehussainlabs, you can use pip:

pip install musaddiquehussainlabs

Usage

from musaddiquehussainlabs.nlp_components import nlp
from musaddiquehussainlabs.text_preprocessing import preprocess_text, preprocess_operations
from musaddiquehussainlabs.document_analysis import DocumentAnalysis

data_to_process = "The employee's SSN is 859-98-0987. The employee's phone number is 555-555-5555."

# Using NLP component
result = nlp.predict(component_type="ner", input_text=data_to_process)
print(result)

# Text preprocessing
preprocessed_text = preprocess_text(data_to_process)
print(preprocessed_text)

# Custom Text preprocessing
preprocess_functions = [preprocess_operations.to_lower]
preprocessed_text = preprocess_text(data_to_process, preprocess_functions)
print(preprocessed_text)

# Document analysis
document_analysis = DocumentAnalysis()

# Option 1: full analysis
result = document_analysis.full_analysis(data_to_process)

# Option 2: Individual document analysis
result = document_analysis.pii_anonymization(data_to_process)

print(result)

Feel free to explore more functionalities and customize the usage based on your requirements!

For detailed usage examples and API documentation, please refer to the documentation (docs link comming soon) available.

Upcoming Features

We're continuously working on expanding MusaddiqueHussainLabs to provide even more capabilities for NLP tasks. Please stay tuned for these exciting enhancements!

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

musaddiquehussainlabs-0.0.2.tar.gz (47.7 kB view details)

Uploaded Source

Built Distribution

musaddiquehussainlabs-0.0.2-py3-none-any.whl (50.2 kB view details)

Uploaded Python 3

File details

Details for the file musaddiquehussainlabs-0.0.2.tar.gz.

File metadata

  • Download URL: musaddiquehussainlabs-0.0.2.tar.gz
  • Upload date:
  • Size: 47.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for musaddiquehussainlabs-0.0.2.tar.gz
Algorithm Hash digest
SHA256 2c1554d68a792fa9d929b4f32c1930927c087bb6a5d9cd98c8b84ae81f2225c6
MD5 faca7bd170f141ae366aaf5b92c36e2b
BLAKE2b-256 b12fa94d600201cef13a1742462fbc55227067b3aa4b8ad7038da470d57a430f

See more details on using hashes here.

File details

Details for the file musaddiquehussainlabs-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for musaddiquehussainlabs-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9d466aab81f29c664202bbfd6cb1f5f7c6219620a8679f5db316d164c1b200e6
MD5 94f170f6771c0d6af3312cac51a4edcb
BLAKE2b-256 aa0d159d6f9afec86da036ee29ec73c6c63df66de073f3ba3b45be053feecc41

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page