Skip to main content

Lychee Language Core: Optimized Slang Replacement and NLP Toolkit.

Project description

Lychee Language Core (him-lychee) Version 0.2.0 - Developed by Himpadma "Him"

Lychee is a lightweight, highly optimized Python package designed to quickly process user-generated text. It provides robust, single-pass slang replacement and a powerful suite of text cleaning tools necessary for Natural Language Processing (NLP) tasks like Sentiment Analysis.

Installation pip install him-lychee

Post-Installation Setup (Required for Full NLP Features) To use the advanced features (Stopwords, Stemming, Lemmatization, SpaCy), you must download the required models once:

python -m nltk.downloader stopwords punkt wordnet python -m textblob.download_corpora python -m spacy download en_core_web_sm

Lychee Core Usage (SlangDictionary Class) The core SlangDictionary class provides robust, optimized slang replacement.

Method

Description

Example Usage

replace_slang_in_text(text)

Crucial for Data Cleaning. Replaces all recognized slang terms in a single string with their full meanings. Highly optimized using a single regex pass.

slang_core.replace_slang_in_text(text)

get_meaning(slang_term)

Finds the meaning of a given slang term (case-insensitive).

slang_core.get_meaning('BRB')

reverse_lookup(meaning)

Finds all slang terms that map to a specific meaning.

slang_core.reverse_lookup('Laugh out loud')

Pandas Example (Recommended Usage) import pandas as pd import lychee

slang_core = lychee.SlangDictionary() df = pd.DataFrame({'review': ['OMG, that pic is GOAT!', 'IDK why BRB took so long.']})

Apply the function across the entire DataFrame column for high speed

df['cleaned_review'] = df['review'].apply(slang_core.replace_slang_in_text)

NLP Cleaning Pipeline (TextCleaner Class) The TextCleaner class provides functions to prepare text for machine learning models.

cleaner = lychee.TextCleaner() text = "The
GOAT said: https://example.com/ LOL! 😃"

Function

Description

Example Usage

remove_html_tags(text)

Strips HTML markup from the text.

cleaner.remove_html_tags(text)

remove_urls(text)

Removes all web URLs (http, https, www).

cleaner.remove_urls(text)

remove_punctuation(text)

Removes standard punctuation marks.

cleaner.remove_punctuation(text)

clean_emojis(text, mode='replace')

Replaces emojis with text codes (e.g., 😃 -> :smiling_face:), or removes them if mode='remove'.

cleaner.clean_emojis(text, 'replace')

remove_stopwords(text)

Removes common stop words (e.g., 'a', 'the', 'is').

cleaner.remove_stopwords(text)

spelling_correction(text)

Corrects common misspellings (using TextBlob, can be slow).

cleaner.spelling_correction(text)

stem_words(text)

Reduces words to their root form (e.g., 'running' -> 'run').

cleaner.stem_words(text)

lemmatize_text(text)

Reduces words to their dictionary form (e.g., 'better' -> 'good').

cleaner.lemmatize_text(text)

tokenize(text, library='nltk')

Splits text into word tokens using either NLTK or SpaCy.

cleaner.tokenize(text, 'spacy')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

him_lychee-0.2.2.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

him_lychee-0.2.2-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file him_lychee-0.2.2.tar.gz.

File metadata

  • Download URL: him_lychee-0.2.2.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for him_lychee-0.2.2.tar.gz
Algorithm Hash digest
SHA256 d0acb964db12e6410b6d26b964c6d1e5ea6c30af7d3cb8e58506d79488eafc80
MD5 bf45d120074999c9e01c74041c0c11f3
BLAKE2b-256 3c3723721e7439b0a0fbeec9a287858035a10ab947a69f4758bf37ee79aecab0

See more details on using hashes here.

File details

Details for the file him_lychee-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: him_lychee-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 11.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for him_lychee-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 599fae7b16cc5b102b1fde5f31cf84fa276ff579403bbd32b9cf8dc88a0f5520
MD5 183b4e04576933335c7b922c419d75e4
BLAKE2b-256 2da8016c9408917e06e14e7af6ab6f53af71cfa9e2a60ee4eb4dba8e75c0ebcd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page