Skip to main content

LexiDecay is a semi-supervised lexical weighting model for unstructured text. It classifies content by adaptive word-frequency decay and soft lexical scoring. Fast (O(n·m)), language-flexible, and training-free — ideal for topic classification, semantic filtering, and intent detection.

Project description

⚡️ LexiDecay — The Adaptive Lexical Decay Classifier

By Mohammad Taha Gorji

A blazing-fast, semi-supervised text classification algorithm based on adaptive lexical weighting, frequency decay, and probabilistic scoring — all without any training or labeled dataset. LexiDecay is a semi-supervised lexical weighting model for unstructured text. It classifies content by adaptive word-frequency decay and soft lexical scoring. Fast (O(n·m)), language-flexible, and training-free — ideal for topic classification, semantic filtering, and intent detection.


🌌 Algorithm Philosophy & Core Idea

LexiDecay is inspired by the way human cognition evaluates language — not by rigid statistical training, but by dynamically weighting words according to their contextual importance and rarity.
Instead of “learning” through countless iterations, LexiDecay understands by measuring the gravitational pull of words within conceptual clusters.

The algorithm analyzes each category’s text content, counts and weights its tokens, and applies a decay function that reduces the influence of overly common words (like “the”, “of”, “and”).
During classification, it computes soft lexical similarities using adaptive decay, inverse document frequency, and a softmax-based probability normalization.

🧠 Philosophically, LexiDecay reflects a cognitive model of understanding — flexible, intuitive, and progressively self-balancing.


🧩 Scientific Position

Category Description
Learning Type Semi-supervised lexical weighting
Data Type Unstructured free text
Complexity O(n × m) — n = words in input, m = number of categories
Core Mechanism Adaptive word-frequency decay + soft lexical scoring
Primary Fields NLP, cognitive AI, text understanding, knowledge extraction

🚀 Real-World Applications

LexiDecay is suitable for a wide variety of language-intelligent systems:

  • 🗂 Topic classification — Distinguish content across domains (e.g. science, art, politics).
  • 🎯 Intent detection — Recognize user intentions from text queries or chatbot messages.
  • 🧭 Semantic filtering — Filter or route information based on conceptual meaning.
  • 🪶 Keyword-based reasoning — Identify thematic or conceptual similarity.
  • 🧠 Cognitive AI prototypes — For lightweight, reasoning-like models without deep networks.

⚖️ Advantages Over Classical Models

Feature LexiDecay Classical Models (Naive Bayes, TF-IDF, etc.)
Training Required ❌ None — works instantly ✅ Needs training
Computation Speed ⚡ Extremely fast (O(n·m)) 🐢 Often slower (training + inference)
Flexibility 🧩 Add or remove categories freely 🔒 Fixed to trained dataset
Data Requirements 🌱 Works with few samples 📊 Needs many labeled samples
Common Word Handling 🪶 Auto frequency decay & adaptive weighting ⚙️ Manual stopword removal
Language Support 🌍 Fully language-independent ⚠️ Usually language-specific
Explainability 🔍 Transparent lexical logic 🕳 Often black-box statistics

💡 LexiDecay combines the interpretability of lexical systems with the adaptability of probabilistic models — no training, no fine-tuning, no waiting.


⚙️ Installation

pip install LexiDecay

That’s it! 🪄


🧱 Getting Started

Below is a full example of how to use LexiDecay from scratch.

from LexiDecay import LexiDecayModel

# 1️⃣ Create a model
m = LexiDecayModel()

# 2️⃣ Add categories (each category can be a string or list of texts)
m.add_category("science", open("science.txt").read())
m.add_category("philosophy", open("philosophy.txt").read())

# 3️⃣ Classify new input
text = "Quantum theories explore the probabilistic structure of the universe."
result = m.classify(text)

print(result["top"])        # ('science', score, probability)
print(result["probs"])      # Probabilities for all categories

🧠 Function Reference & API Details

🔹 add_category(label, content)

Adds or replaces a category.

  • label: str → name of the category
  • content: str or List[str] → text data belonging to that category

Automatically rebuilds the internal vocabulary and frequency statistics.


🔹 classify(input_text, decay=0.5, use_idf=False, auto_common_reduce=True, common_decay=0.7, min_common_mult=0.05, ignore_input_repetitions=False)

Performs text classification and returns a dictionary with:

{
  "scores": {label: float, ...},
  "probs": {label: float, ...},
  "matches": {label: {matched words, stats...}},
  "top": (best_label, score, probability)
}

Parameters:

Parameter Type Default Description
decay float 0.5 Controls how strongly frequent words lose influence (0 = linear, 1 = no decay).
use_idf bool False Applies inverse-document-frequency weighting.
auto_common_reduce bool True Automatically detects common words and lowers their impact.
common_decay float 0.7 Strength of reduction for common words.
min_common_mult float 0.05 Minimum multiplier applied to frequent words.
ignore_input_repetitions bool False If True, counts each unique input word only once.

🔹 save_model(path)

Saves the entire model (categories + data) into a .pkl file.

m.save_model("lexidecay.pkl")

🔹 load_model(path)

Loads a model from a .pkl file.

m2 = LexiDecayModel.load_model("lexidecay.pkl")

🌟 Why LexiDecay Feels Different

  • Human-like text perception: adaptive decay mimics cognitive salience.
  • Instant deployability: no model training — just plug and classify.
  • Infinite extendability: add categories anytime, instantly rebuilt.
  • Compact and dependency-light: only requires NumPy.
  • Transparent math: pure lexical weighting, fully explainable results.

🧬 Example: Multi-category Classification

from LexiDecay import LexiDecayModel
m = LexiDecayModel()
m.add_category("tech", ["AI","Model","AI algorithms", "neural networks", "deep learning"])
m.add_category("art", ["painting", "music", "creativity", "aesthetic beauty"])
m.add_category("sports", ["football", "strength", "competition"])

res = m.classify("New AI model beats humans at creative painting tasks.")
print(res)
# Output → ('art', score, probability)

🧩 Citation

If you use LexiDecay in academic work, please cite:

Mohammad Taha Gorji, LexiDecay: Semi-supervised Lexical Decay Model for Adaptive Text Classification (2025)


🔹 Examples

You can see LexiDecay Examples for some examples:

See here some examples


🪄 Author

Mohammad Taha Gorji Creator of LexiDecay AI Researcher & Cognitive Systems Developer


🖤 License

Apache2 License © 2025 — Mohammad Taha Gorji Open for research, education, and innovation.


“LexiDecay doesn’t learn — it understands.” 🧠✨

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lexidecay-1.0.0.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lexidecay-1.0.0-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file lexidecay-1.0.0.tar.gz.

File metadata

  • Download URL: lexidecay-1.0.0.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for lexidecay-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3d55b6ae5a3f8c9139babb3756e81632f2d43a91657225e62e00952f3b994c8a
MD5 0085626ac1d6fd9f5e718437ba35238b
BLAKE2b-256 d23852b79feabf02316fb19e7417ea48aa2907c031387b954bd145207d1c4c97

See more details on using hashes here.

File details

Details for the file lexidecay-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: lexidecay-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for lexidecay-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 04bc2995ef8a4a49e76bb084382e04f8ba022501694bbf5f962b0685c778cc8f
MD5 1e720e4d3048d5685c5f3dc47248a8f2
BLAKE2b-256 97938042886365f5c9691cbf4804d01ba6d2e918d475d7e9886941e78f6dc365

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page