Skip to main content

MusaddiqueHussainLabs: Empowering text analytics with advanced tools for comprehensive Natural Language Processing (NLP) and Language Models (LLMs).

Project description

MusaddiqueHussainLabs NLP: State-of-the-Art Natural Language Processing & LLMs Library

MusaddiqueHussainLabs is a comprehensive Natural Language Processing (NLP) library designed to offer state-of-the-art functionality for various NLP tasks. This Python package provides a range of tools and functionalities aimed at facilitating NLP tasks, document analysis, and text preprocessing.

Features

Currently the package is organized into three primary modules:

1. NLP Components

Component Type Description
tokenize Text tokenization
pos Part-of-Speech tagging
lemma Word lemmatization
morphology Study of word forms
dep Dependency parsing
ner Named Entity Recognition
norm Text normalization

2. Text Preprocessing

This module equips users with an extensive set of text preprocessing tools:

Function Description
to_lower Convert text to lowercase
to_upper Convert text to uppercase
remove_number Remove numerical characters
remove_itemized_bullet_and_numbering Eliminate itemized/bullet-point numbering
remove_url Remove URLs from text
remove_punctuation Remove punctuation marks
remove_special_character Remove special characters
keep_alpha_numeric Keep only alphanumeric characters
remove_whitespace Remove excess whitespace
normalize_unicode Normalize Unicode characters
remove_stopword Eliminate common stopwords
remove_freqwords Remove frequently occurring words
remove_rarewords Remove rare words
remove_email Remove email addresses
remove_phone_number Remove phone numbers
remove_ssn Remove Social Security Numbers (SSN)
remove_credit_card_number Remove credit card numbers
remove_emoji Remove emojis
remove_emoticons Remove emoticons
convert_emoticons_to_words Convert emoticons to words
convert_emojis_to_words Convert emojis to words
remove_html Remove HTML tags
chat_words_conversion Convert chat language to standard English
expand_contraction Expand contractions (e.g., "can't" to "cannot")
tokenize_word Tokenize words
tokenize_sentence Tokenize sentences
stem_word Stem words
lemmatize_word Lemmatize words
preprocess_text Combine multiple preprocessing steps into one function

3. Document Analysis

Functionality Description
Language Detect document language
Linguistic Analysis Resolve ambiguities
Key phrases Retrieve relevant information from documents
NER Named Entity Recognition
Sentiment Analyze sentiment of text
PII Anonymization Anonymize Personally Identifiable Information

Prerequisites

  • Python >= 3.9
  • GOOGLE_API_KEY from Google AI Studio
  • Place the API key in a .env file in the project root directory.

Installation

To install musaddiquehussainlabs, you can use pip:

pip install musaddiquehussainlabs

Usage

from musaddiquehussainlabs.nlp_components import nlp
from musaddiquehussainlabs.text_preprocessing import preprocess_text, to_lower, remove_email, remove_url, remove_punctuation, lemmatize_word
from musaddiquehussainlabs.document_analysis import DocumentAnalysis

data_to_process = "The employee's SSN is 859-98-0987. The employee's phone number is 555-555-5555."

# Using NLP component
result = nlp.predict(component_type="ner", input_text=data_to_process)
print(result)

# Text preprocessing
preprocessed_text = preprocess_text(data_to_process)
print(preprocessed_text)

# Custom Text preprocessing
preprocess_functions = [to_lower, remove_email, remove_url, remove_punctuation, lemmatize_word]
preprocessed_text = preprocess_text(data_to_process, preprocess_functions)
print(preprocessed_text)

# Document analysis
document_analysis = DocumentAnalysis()

# Option 1: full analysis
result = document_analysis.full_analysis(data_to_process)

# Option 2: Individual document analysis
result = document_analysis.pii_anonymization(data_to_process)

print(result)

Feel free to explore more functionalities and customize the usage based on your requirements!

For detailed usage examples and API documentation, please refer to the documentation (docs link comming soon) available.

Upcoming Features

We're continuously working on expanding MusaddiqueHussainLabs to provide even more capabilities for NLP tasks. Please stay tuned for these exciting enhancements!

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

musaddiquehussainlabs-0.0.1.tar.gz (47.3 kB view details)

Uploaded Source

Built Distribution

musaddiquehussainlabs-0.0.1-py3-none-any.whl (49.3 kB view details)

Uploaded Python 3

File details

Details for the file musaddiquehussainlabs-0.0.1.tar.gz.

File metadata

  • Download URL: musaddiquehussainlabs-0.0.1.tar.gz
  • Upload date:
  • Size: 47.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for musaddiquehussainlabs-0.0.1.tar.gz
Algorithm Hash digest
SHA256 d6ef7f22c90e633e94b25aed25e96eee89c6a8422e374ec4b09d120d43ed0a69
MD5 24aaf08f4f7c298d6ed505b7d652e97f
BLAKE2b-256 44fc1b4e15d80c7d00a33c7eb5adfcd027ebb85c18eca332a96d513119278675

See more details on using hashes here.

File details

Details for the file musaddiquehussainlabs-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for musaddiquehussainlabs-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6aa9cf91a1845d46109c70b36148e06fdba781012919e3850de230f1a6e93c49
MD5 2c9bc1f248469f71c674f1bc80c1ad9f
BLAKE2b-256 6d58305edddfff50e246e9448db420e7d280a4772bbef17be7ad91cca85a79fd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page