Sotastream is a command line tool that augments a batch of text and produces infinite stream of records.
Infinibatch is a library of checkpointable iterators for randomized data loading of massive data sets in deep neural network training.
mtdata is a tool to download datasets for machine translation
Reader Translator Generator(RTG), a Neural Machine Translator(NMT) toolkit based on Pytorch
A test project using pybind11 and CMake
nlcodec is a collection of encoding schemes for natural language sequences. nlcodec.db is a efficient storage and retrieval layer for integer sequences of varying lengths.
Hassle-free computation of shareable, comparable, and reproducible BLEU, chrF, and TER scores
UNMASS - Unsupervised NMT with Masked Sequence-to-Sequence training
Junk Not-Junk Detector
awkg is an awk-like text-processing tool powered by python language