Skip to main content

A POS tagger for the Wolaita language using deep learning

Project description

Wolaita_POST

Overview

Wolaita_POST is a Python package designed for accurate Part-of-Speech (POS) tagging in the Wolaita language. It employs deep learning models, including Bi-GRU, Bi-LSTM, and others, and integrates FastText embeddings for enhanced performance. The package utilizes pretrained models to simplify deployment and improve tagging accuracy. Wolaita_POST is an essential tool for researchers and developers focused on Natural Language Processing (NLP) for lesser-resourced languages, providing a robust solution for Wolaita language text analysis.

Features

  • Accurate POS Tagging: Utilizes deep learning models (Bi-GRU, Bi-LSTM, etc.) to achieve precise Part-of-Speech tagging for Wolaita language text.
  • Pretrained Models: Ready-to-use pretrained models for quick deployment and high accuracy.
  • FastText Embeddings: Incorporates FastText word embeddings to capture subword information and improve performance on low-resource languages.
  • Easy Integration: Simple API that allows researchers and developers to integrate POS tagging into their NLP pipelines.
  • Supports Wolaita Language: Specifically designed for the Wolaita language, addressing the challenges of processing lesser-resourced languages.
  • Customizable: Flexible configuration to accommodate different models, tokenizers, and word vectors based on project requirements.
  • Efficient Deployment: Enables easy deployment for various NLP applications, such as machine translation and named entity recognition (NER).

Installation

To install Wolaita_POST, you can use pip:

  • pip install Wolaita_POST

##Usage

After installation, you can use Wolaita_POST as follows:

  1. Import the package: from Wolaita_POST import pos_tagger
  2. Set file paths for your pretrained model, word vectors, and tokenizers: model_path = "/content/drive/MyDrive/POS/Bi_GRU_model.h5" # Adjust if your model file has a different extension word_vector_path = "/content/drive/MyDrive/POS/fasttext_compatible.bin" word_tokenizer_path = "/content/drive/MyDrive/POS/wolaita_tokenizerX.pkl" tag_tokenizer_path = "/content/drive/MyDrive/POS/wolaita_tag_tokenizerY.pkl"
  3. Initialize the POS tagger: pos_tagger = WolaitaPOSTagger( model_path=model_path, word_vector_path=word_vector_path, word_tokenizer_path=word_tokenizer_path, tag_tokenizer_path=tag_tokenizer_path )
  4. Use the POS tagger to tag Wolaita text: text = ['Insert your sample text here'] tagged_text = pos_tagger.tag(text) print(tagged_text)

The tagged_text will contain the part-of-speech tags for the given Wolaita text.

##Running Tests If you want to verify functionality, you can use pytest. Run this command in your project directory:

  • !pytest /content/drive/MyDrive/Wolaita_POST/tests > test_report.txt

##License This project is licensed under the MIT License. See the LICENSE file for more details.

##Contributing Contributions are welcome! If you have suggestions for improving the package or find any issues, feel free to open a pull request or submit an issue on GitHub.

##Acknowledgements Special thanks to the developers and researchers who contributed to this project, making it possible to expand NLP resources for the Wolaita language.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wolaita_post-0.1.0.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

Wolaita_POST-0.1.0-py3-none-any.whl (4.2 kB view details)

Uploaded Python 3

File details

Details for the file wolaita_post-0.1.0.tar.gz.

File metadata

  • Download URL: wolaita_post-0.1.0.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for wolaita_post-0.1.0.tar.gz
Algorithm Hash digest
SHA256 755c48b3ecdea28b86c9a4e206af6a81f1b62997a477ec08c43e34005592b529
MD5 840e629bfb36ad53e95b252029d29ded
BLAKE2b-256 bc0b7c284a88126d41dcd9fe1ac9661036ddd9ea8f934ca7e8c122836086d8be

See more details on using hashes here.

File details

Details for the file Wolaita_POST-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for Wolaita_POST-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8d4086d1499c1dcf90443fecd6e61711d01cc342f6ac99cffa656f8c8df5d495
MD5 235206866656df3d04427bd365cf661d
BLAKE2b-256 b8c0e4a25cc3cf7d4c927d77ecae9ab0a1d478a96d9b5455da03ce9d2742648c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page