14 projects
pymarian
Pymarian
pigz
pigz - python bindings to pigz (parallel gzip)
mtdata
mtdata is a tool to download datasets for machine translation
nllb-serve
NLLB Serve
boteval
Chat Bot Evaluation
sotastream
Sotastream is a command line tool that augments a batch of text and produces infinite stream of records.
infinibatch
Infinibatch is a library of checkpointable iterators for randomized data loading of massive data sets in deep neural network training.
rtg
Reader Translator Generator(RTG), a Neural Machine Translator(NMT) toolkit based on Pytorch
nlcodec
nlcodec is a collection of encoding schemes for natural language sequences. nlcodec.db is a efficient storage and retrieval layer for integer sequences of varying lengths.
sacrebleu-macrof
Hassle-free computation of shareable, comparable, and reproducible BLEU, chrF, and TER scores
sacremoses-xt
SacreMoses (Extended)
unmass
UNMASS - Unsupervised NMT with Masked Sequence-to-Sequence training
junkdetect
Junk Not-Junk Detector
awkg
awkg is an awk-like text-processing tool powered by python language