Tamil word lemmatizer. Converts inflected Tamil words into base lemma.
Project description
Tamil Lemmatizer
Tamil Lemmatizer is a character-level lemmatization library for Tamil text.
It normalizes inflected Tamil word forms and maps them to their base lemma using a deep learning model (PyTorch).
✨ Features
- ✅ Lemmatizes Tamil words to their base form
- ✅ Handles unseen words using a character-level sequence model
- ✅ Simple Python API
- ✅ Supports batch inference
- ✅ Open-source and extensible
📦 Installation
pip install tamil-lemmatizer
🚀 Quick Start
from tamil_lemmatizer import TamilLemmatizer
lemmatizer = TamilLemmatizer()
word = "சென்றார்கள்"
lemma = lemmatizer.lemmatize(word)
print(lemma) # Output: செல்
Batch input
words = ["பாடுகிறது", "வந்தார்கள்", "சென்றேன்"]
print(lemmatizer.lemmatize_batch(words))
📚 Description
Tamil is morphologically rich. A single lemma can have hundreds of inflected variations. This library uses:
- A character-level encoder-decoder architecture
- Trained using PyTorch on a curated Tamil lemma dataset
- Supports lemmatization for verbs and nouns
🛠️ Model Architecture
- Encoder: BiLSTM or Transformer (depending on version)
- Decoder: Attention-based sequence generator
- Loss: Cross entropy over Tamil character vocabulary
🔧 CLI Usage
tamil-lemmatizer "வந்தார்கள்"
📄 License
This project is released under the MIT License.
🤝 Contributing
Pull requests are welcome. If contributing major changes, open an issue first to discuss what you want to change.
✉️ Contact
Maintainer: Hemanth Kumar GitHub: Hemanth Thunder
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tamil_lemmatizer-0.0.2.tar.gz.
File metadata
- Download URL: tamil_lemmatizer-0.0.2.tar.gz
- Upload date:
- Size: 5.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb2d7a0afd72f22e8a5472513178fddfc488c1ce1847e95d0517ced8431e70b7
|
|
| MD5 |
71da6275e3d93e2cd6f6d38b870b9c14
|
|
| BLAKE2b-256 |
686bdca2beaa877abaf942fa41be4175b77fdfde1c20247bd1919a19a803f4c4
|
File details
Details for the file tamil_lemmatizer-0.0.2-py3-none-any.whl.
File metadata
- Download URL: tamil_lemmatizer-0.0.2-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a5f3623aaa53fbfa2784518754b10624fbadb6d13a3189dda8acfdd12d57e64
|
|
| MD5 |
584b0ab7b3d6b5a164585bf83938ca85
|
|
| BLAKE2b-256 |
365764d2d97aa2ab69ba70f19f4ba592239257ced7d759964e0174745c07f61c
|