A POS tagger for the Wolaita language using deep learning
Project description
Wolaita_POST
Overview
Wolaita_POST is a Python package designed for accurate Part-of-Speech (POS) tagging in the Wolaita language. It employs deep learning models, including Bi-GRU, Bi-LSTM, and others, and integrates FastText embeddings for enhanced performance. The package utilizes pretrained models to simplify deployment and improve tagging accuracy. Wolaita_POST is an essential tool for researchers and developers focused on Natural Language Processing (NLP) for lesser-resourced languages, providing a robust solution for Wolaita language text analysis.
Features
- Accurate POS Tagging: Utilizes deep learning models (Bi-GRU, Bi-LSTM, etc.) to achieve precise Part-of-Speech tagging for Wolaita language text.
- Pretrained Models: Ready-to-use pretrained models for quick deployment and high accuracy.
- FastText Embeddings: Incorporates FastText word embeddings to capture subword information and improve performance on low-resource languages.
- Easy Integration: Simple API that allows researchers and developers to integrate POS tagging into their NLP pipelines.
- Supports Wolaita Language: Specifically designed for the Wolaita language, addressing the challenges of processing lesser-resourced languages.
- Customizable: Flexible configuration to accommodate different models, tokenizers, and word vectors based on project requirements.
- Efficient Deployment: Enables easy deployment for various NLP applications, such as machine translation and named entity recognition (NER).
Installation
To install Wolaita_POST, you can use pip:
- pip install Wolaita_POST
##Usage
After installation, you can use Wolaita_POST as follows:
- Import the package: from Wolaita_POST import pos_tagger
- Set file paths for your pretrained model, word vectors, and tokenizers: model_path = "/content/drive/MyDrive/POS/Bi_GRU_model.h5" # Adjust if your model file has a different extension word_vector_path = "/content/drive/MyDrive/POS/fasttext_compatible.bin" word_tokenizer_path = "/content/drive/MyDrive/POS/wolaita_tokenizerX.pkl" tag_tokenizer_path = "/content/drive/MyDrive/POS/wolaita_tag_tokenizerY.pkl"
- Initialize the POS tagger: pos_tagger = WolaitaPOSTagger( model_path=model_path, word_vector_path=word_vector_path, word_tokenizer_path=word_tokenizer_path, tag_tokenizer_path=tag_tokenizer_path )
- Use the POS tagger to tag Wolaita text: text = ['Insert your sample text here'] tagged_text = pos_tagger.tag(text) print(tagged_text)
The tagged_text will contain the part-of-speech tags for the given Wolaita text.
##Running Tests If you want to verify functionality, you can use pytest. Run this command in your project directory:
- !pytest /content/drive/MyDrive/Wolaita_POST/tests > test_report.txt
##License This project is licensed under the MIT License. See the LICENSE file for more details.
##Contributing Contributions are welcome! If you have suggestions for improving the package or find any issues, feel free to open a pull request or submit an issue on GitHub.
##Acknowledgements Special thanks to the developers and researchers who contributed to this project, making it possible to expand NLP resources for the Wolaita language.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file wolaita_post-0.1.0.tar.gz
.
File metadata
- Download URL: wolaita_post-0.1.0.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 755c48b3ecdea28b86c9a4e206af6a81f1b62997a477ec08c43e34005592b529 |
|
MD5 | 840e629bfb36ad53e95b252029d29ded |
|
BLAKE2b-256 | bc0b7c284a88126d41dcd9fe1ac9661036ddd9ea8f934ca7e8c122836086d8be |
File details
Details for the file Wolaita_POST-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: Wolaita_POST-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d4086d1499c1dcf90443fecd6e61711d01cc342f6ac99cffa656f8c8df5d495 |
|
MD5 | 235206866656df3d04427bd365cf661d |
|
BLAKE2b-256 | b8c0e4a25cc3cf7d4c927d77ecae9ab0a1d478a96d9b5455da03ce9d2742648c |