No project description provided

Project description

nmatheg

Nmatheg نماذج an easy straregy for training Arabic NLP models on huggingface datasets. Just specifiy the name of the dataset, preprocessing, tokenization and the training procedure in the config file to train an nlp model for that task.

Configuration

Setup a config file for the training strategy.

[dataset]
dataset_name = ajgt_twitter_ar
task = classification 

[preprocessing]
segment = False
remove_special_chars = False
remove_english = False
normalize = False
remove_diacritics = False
excluded_chars = []
remove_tatweel = False
remove_html_elements = False
remove_links = False 
remove_twitter_meta = False
remove_long_words = False
remove_repeated_chars = False

[tokenization]
tokenizer_name = WordTokenizer
vocab_size = 10000
max_tokens = 128

[train]
dir = .
epochs = 10
batch_size = 256

Usage

import nmatheg as nm
strategy = nm.TrainStrategy('config.ini')
strategy.start()

Datasets

We are supporting huggingface datasets for Arabic. You can find the supported datasets here.

Models

Classification Models

Demo

Check this colab notebook for a quick demo.

Project details

Release history Release notifications | RSS feed

0.0.4

Jul 19, 2021

0.0.2

Mar 30, 2021

This version

0.0.1

Mar 17, 2021

0.0.0

Mar 17, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nmatheg-0.0.1.tar.gz (5.9 kB view hashes)

Uploaded Mar 17, 2021 Source

Built Distribution

nmatheg-0.0.1-py3-none-any.whl (6.5 kB view hashes)

Uploaded Mar 17, 2021 Python 3

Hashes for nmatheg-0.0.1.tar.gz

Hashes for nmatheg-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`e83b082bb8a0b049aa2657c9b12b4454167cea6f90f89e131841bd672ef5710c`
MD5	`feb29bb17f388832cab7ceeb582796b3`
BLAKE2b-256	`5334582df2f40341971a3281356db8bea37f9f625a368ab37869a90bd406dff1`

Hashes for nmatheg-0.0.1-py3-none-any.whl

Hashes for nmatheg-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`40dfad4eded4c85424917abff41e152375910f4381b36d5dda14b46eac549bee`
MD5	`440bd3c9a74154dde2873ce1db14045c`
BLAKE2b-256	`251f9d5e8fb2a18ee45b44cbba9f5cf8cdfffdc35b5ea1cc6feb519b05c10884`