Skip to main content

A python package to load Shona FastText embeddings,Train Fasttext Embedding and clean Shona text data

Project description

blessmore

blessmore is a Python package designed to load Pretrained Shona FastText embeddings, train FastText embeddings, and clean text data.

Installation

Install the package using pip:

pip install blessmore

Usage

Loading Pre-trained FastText Models

The package allows you to load pre-trained FastText models of different dimensions (50, 100, 300, and 500 dimensions).

from blessmore import load_fasttext_model

# Load a 50-dimensional FastText model
model_50 = load_fasttext_model(50)

# Load a 100-dimensional FastText model
model_100 = load_fasttext_model(100)

# Load a 300-dimensional FastText model
model_300 = load_fasttext_model(300)

# Load a 500-dimensional FastText model
model_500 = load_fasttext_model(500)

Training FastText Models

You can also train new FastText embeddings using your own text data of any language. The train_fasttext_model function will clean the text data and train a FastText model with the specified dimensions.

from blessmore import train_fasttext_model

corpus_file_path = 'shona_corpus.txt'
vector_size = 50  # Specify the dimension you want to train

# Train a FastText model
model = train_fasttext_model(corpus_file_path, vector_size)

Cleaning Shona Text Data

The package provides functionality to clean text data, which involves tokenizing the text, removing non-letter symbols, and lowercasing the text.

Cleaning Text Data from a File

Clean text data from a file and save the cleaned text to a new file.

from blessmore import clean_data

input_file = 'shona_corpus.txt'
output_file = 'cleaned_shona_corpus.txt'

# Clean text data from the input file and save it to the output file
clean_data(input_file, output_file)

Modules

shonaembeddings.py

This module contains the function to load pre-trained FastText models.

train_embedding.py

This module contains functions to clean text data and train FastText models.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Author

Blessmore Majongwe - blessmoremajongwe@gmail.com

Acknowledgments

  • Hugging Face
  • Gensim

This structure provides a clear and organized way for users to understand how to use your package, with code examples correctly formatted and explanations properly laid out.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blessmore-3.0.3.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

blessmore-3.0.3-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file blessmore-3.0.3.tar.gz.

File metadata

  • Download URL: blessmore-3.0.3.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.12

File hashes

Hashes for blessmore-3.0.3.tar.gz
Algorithm Hash digest
SHA256 13bee4031266d788824fc9ec08a5fbd32cd0de5a46e34c4362b011d2a09fe997
MD5 1eea33c86553ab59ebfe54ace5124a16
BLAKE2b-256 0e911c6c10466f9ca7a39399de122b60da2e32fd7e8c2acdc422f83bc19f58bf

See more details on using hashes here.

File details

Details for the file blessmore-3.0.3-py3-none-any.whl.

File metadata

  • Download URL: blessmore-3.0.3-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.12

File hashes

Hashes for blessmore-3.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 00aaaeab5f3d0d22a380b5f3da66d42387b6383749113821e8aaf09996ae6191
MD5 1c69a1d1734ec92e26a341851dfd3bf7
BLAKE2b-256 e3ad56c40135480828de114fe7d83b175d34d78989d3884ddf4a7d3d97cfa4e5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page