A python package to load Shona FastText embeddings,Train Fasttext Embedding and clean Shona text data
Project description
blessmore
blessmore
is a Python package designed to load Pretrained Shona FastText embeddings, train FastText embeddings, and clean text data.
Installation
Install the package using pip:
pip install blessmore
Usage
Loading Pre-trained FastText Models
The package allows you to load pre-trained FastText models of different dimensions (50, 100, 300, and 500 dimensions).
from blessmore import load_fasttext_model
# Load a 50-dimensional FastText model
model_50 = load_fasttext_model(50)
# Load a 100-dimensional FastText model
model_100 = load_fasttext_model(100)
# Load a 300-dimensional FastText model
model_300 = load_fasttext_model(300)
# Load a 500-dimensional FastText model
model_500 = load_fasttext_model(500)
Training FastText Models
You can also train new FastText embeddings using your own text data of any language. The train_fasttext_model
function will clean the text data and train a FastText model with the specified dimensions.
from blessmore import train_fasttext_model
corpus_file_path = 'shona_corpus.txt'
vector_size = 50 # Specify the dimension you want to train
# Train a FastText model
model = train_fasttext_model(corpus_file_path, vector_size)
Cleaning Shona Text Data
The package provides functionality to clean text data, which involves tokenizing the text, removing non-letter symbols, and lowercasing the text.
Cleaning Text Data from a File
Clean text data from a file and save the cleaned text to a new file.
from blessmore import clean_data
input_file = 'shona_corpus.txt'
output_file = 'cleaned_shona_corpus.txt'
# Clean text data from the input file and save it to the output file
clean_data(input_file, output_file)
Modules
shonaembeddings.py
This module contains the function to load pre-trained FastText models.
train_embedding.py
This module contains functions to clean text data and train FastText models.
License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
Author
Blessmore Majongwe - blessmoremajongwe@gmail.com
Acknowledgments
- Hugging Face
- Gensim
This structure provides a clear and organized way for users to understand how to use your package, with code examples correctly formatted and explanations properly laid out.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file blessmore-3.0.3.tar.gz
.
File metadata
- Download URL: blessmore-3.0.3.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13bee4031266d788824fc9ec08a5fbd32cd0de5a46e34c4362b011d2a09fe997 |
|
MD5 | 1eea33c86553ab59ebfe54ace5124a16 |
|
BLAKE2b-256 | 0e911c6c10466f9ca7a39399de122b60da2e32fd7e8c2acdc422f83bc19f58bf |
File details
Details for the file blessmore-3.0.3-py3-none-any.whl
.
File metadata
- Download URL: blessmore-3.0.3-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00aaaeab5f3d0d22a380b5f3da66d42387b6383749113821e8aaf09996ae6191 |
|
MD5 | 1c69a1d1734ec92e26a341851dfd3bf7 |
|
BLAKE2b-256 | e3ad56c40135480828de114fe7d83b175d34d78989d3884ddf4a7d3d97cfa4e5 |