Skip to main content

No project description provided

Project description

nmatheg

Nmatheg نماذج an easy straregy for training Arabic NLP models on huggingface datasets. Just specifiy the name of the dataset, preprocessing, tokenization and the training procedure in the config file to train an nlp model for that task.

Configuration

Setup a config file for the training strategy.

[dataset]
dataset_name = ajgt_twitter_ar
task = classification 

[preprocessing]
segment = False
remove_special_chars = False
remove_english = False
normalize = False
remove_diacritics = False
excluded_chars = []
remove_tatweel = False
remove_html_elements = False
remove_links = False 
remove_twitter_meta = False
remove_long_words = False
remove_repeated_chars = False

[tokenization]
tokenizer_name = WordTokenizer
vocab_size = 10000
max_tokens = 128

[train]
dir = .
epochs = 10
batch_size = 256

Usage

import nmatheg as nm
strategy = nm.TrainStrategy('config.ini')
strategy.start()

Datasets

We are supporting huggingface datasets for Arabic. You can find the supported datasets here.

Models

  • Classification Models

Demo

Check this colab notebook for a quick demo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nmatheg-0.0.1.tar.gz (5.9 kB view hashes)

Uploaded Source

Built Distribution

nmatheg-0.0.1-py3-none-any.whl (6.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page