Lychee Language Core: A lightweight, high-performance library for slang translation and NLP text cleaning (pre-processing).
Project description
Lychee Language Core (him-lychee) Version 0.2.0 - Developed by Himpadma "Him"
Lychee is a lightweight, highly optimized Python package designed to quickly process user-generated text. It provides robust, single-pass slang replacement and a powerful suite of text cleaning tools necessary for Natural Language Processing (NLP) tasks like Sentiment Analysis.
Installation pip install him-lychee
Post-Installation Setup (Required for Full NLP Features) To use the advanced features (Stopwords, Stemming, Lemmatization, SpaCy), you must download the required models once:
python -m nltk.downloader stopwords punkt wordnet python -m textblob.download_corpora python -m spacy download en_core_web_sm
Lychee Core Usage (SlangDictionary Class) The core SlangDictionary class provides robust, optimized slang replacement.
Method
Description
Example Usage
replace_slang_in_text(text)
Crucial for Data Cleaning. Replaces all recognized slang terms in a single string with their full meanings. Highly optimized using a single regex pass.
slang_core.replace_slang_in_text(text)
get_meaning(slang_term)
Finds the meaning of a given slang term (case-insensitive).
slang_core.get_meaning('BRB')
reverse_lookup(meaning)
Finds all slang terms that map to a specific meaning.
slang_core.reverse_lookup('Laugh out loud')
Pandas Example (Recommended Usage) import pandas as pd import lychee
slang_core = lychee.SlangDictionary() df = pd.DataFrame({'review': ['OMG, that pic is GOAT!', 'IDK why BRB took so long.']})
Apply the function across the entire DataFrame column for high speed
df['cleaned_review'] = df['review'].apply(slang_core.replace_slang_in_text)
NLP Cleaning Pipeline (TextCleaner Class) The TextCleaner class provides functions to prepare text for machine learning models.
cleaner = lychee.TextCleaner()
text = "The
GOAT said: https://example.com/ LOL! 😃"
Function
Description
Example Usage
remove_html_tags(text)
Strips HTML markup from the text.
cleaner.remove_html_tags(text)
remove_urls(text)
Removes all web URLs (http, https, www).
cleaner.remove_urls(text)
remove_punctuation(text)
Removes standard punctuation marks.
cleaner.remove_punctuation(text)
clean_emojis(text, mode='replace')
Replaces emojis with text codes (e.g., 😃 -> :smiling_face:), or removes them if mode='remove'.
cleaner.clean_emojis(text, 'replace')
remove_stopwords(text)
Removes common stop words (e.g., 'a', 'the', 'is').
cleaner.remove_stopwords(text)
spelling_correction(text)
Corrects common misspellings (using TextBlob, can be slow).
cleaner.spelling_correction(text)
stem_words(text)
Reduces words to their root form (e.g., 'running' -> 'run').
cleaner.stem_words(text)
lemmatize_text(text)
Reduces words to their dictionary form (e.g., 'better' -> 'good').
cleaner.lemmatize_text(text)
tokenize(text, library='nltk')
Splits text into word tokens using either NLTK or SpaCy.
cleaner.tokenize(text, 'spacy')
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file him_lychee-0.2.3.tar.gz.
File metadata
- Download URL: him_lychee-0.2.3.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3623a51ee0cfbe7802470fa80bc7ba5cddb97807ffe5cd7064a260447584e9db
|
|
| MD5 |
dcff0b5ba11e04cd3da2f6063fc7453d
|
|
| BLAKE2b-256 |
b24be6e950e37abfe215195246d22fb0c64d556a8f04a5c798b2842abf4f6ff5
|
File details
Details for the file him_lychee-0.2.3-py3-none-any.whl.
File metadata
- Download URL: him_lychee-0.2.3-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
257cb0240361edea807a5d1ed5a5932e465bea82a6e2fd9827880434c1da6219
|
|
| MD5 |
d2656d65e5ecca1aa509806811a6f021
|
|
| BLAKE2b-256 |
23bf5a4d65124ffb749d71e9d53b82ca094b8025884aa54ca51f77bfc03b7288
|