Skip to main content

Tamil word lemmatizer. Converts inflected Tamil words into base lemma.

Project description

Tamil Lemmatizer

Tamil Lemmatizer is a character-level lemmatization library for Tamil text.
It normalizes inflected Tamil word forms and maps them to their base lemma using a deep learning model (PyTorch).


✨ Features

  • ✅ Lemmatizes Tamil words to their base form
  • ✅ Handles unseen words using a character-level sequence model
  • ✅ Simple Python API
  • ✅ Supports batch inference
  • ✅ Open-source and extensible

📦 Installation

pip install tamil-lemmatizer

🚀 Quick Start

from tamil_lemmatizer import TamilLemmatizer

lemmatizer = TamilLemmatizer()

word = "சென்றார்கள்"
lemma = lemmatizer.lemmatize(word)

print(lemma)   # Output: செல்

Batch input

words = ["பாடுகிறது", "வந்தார்கள்", "சென்றேன்"]
print(lemmatizer.lemmatize_batch(words))

📚 Description

Tamil is morphologically rich. A single lemma can have hundreds of inflected variations. This library uses:

  • A character-level encoder-decoder architecture
  • Trained using PyTorch on a curated Tamil lemma dataset
  • Supports lemmatization for verbs and nouns

🛠️ Model Architecture

  • Encoder: BiLSTM or Transformer (depending on version)
  • Decoder: Attention-based sequence generator
  • Loss: Cross entropy over Tamil character vocabulary


🔧 CLI Usage

tamil-lemmatizer "வந்தார்கள்"

📄 License

This project is released under the MIT License.


🤝 Contributing

Pull requests are welcome. If contributing major changes, open an issue first to discuss what you want to change.


✉️ Contact

Maintainer: Hemanth Kumar GitHub: Hemanth Thunder


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tamil_lemmatizer-0.0.2.tar.gz (5.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tamil_lemmatizer-0.0.2-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file tamil_lemmatizer-0.0.2.tar.gz.

File metadata

  • Download URL: tamil_lemmatizer-0.0.2.tar.gz
  • Upload date:
  • Size: 5.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for tamil_lemmatizer-0.0.2.tar.gz
Algorithm Hash digest
SHA256 cb2d7a0afd72f22e8a5472513178fddfc488c1ce1847e95d0517ced8431e70b7
MD5 71da6275e3d93e2cd6f6d38b870b9c14
BLAKE2b-256 686bdca2beaa877abaf942fa41be4175b77fdfde1c20247bd1919a19a803f4c4

See more details on using hashes here.

File details

Details for the file tamil_lemmatizer-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for tamil_lemmatizer-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8a5f3623aaa53fbfa2784518754b10624fbadb6d13a3189dda8acfdd12d57e64
MD5 584b0ab7b3d6b5a164585bf83938ca85
BLAKE2b-256 365764d2d97aa2ab69ba70f19f4ba592239257ced7d759964e0174745c07f61c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page