11 projects
bicleaner-ai
Parallel corpus classifier, indicating the likelihood of a pair of sentences being mutual translations or not (neural version)
bicleaner-hardrules
Pre-filtering step for obvious noise based on rules, poor language based on general language modelling and vulgar language based on specific language modelling
escape-unk
Escape unknown symbols in SentecePiece vocabularies
bicleaner
Parallel corpus classifier, indicating the likelihood of a pair of sentences being mutual translations or not
sacremoses
SacreMoses
fastspell
Targetted language identifier, based on FastText and Hunspell.
fastspell-dictionaries
Hunspell dictionaries for FastSpell
monocleaner
Monolingual corpus fluency filter
bifixer
bicleaner-ai-glove
glove-python fork for bicleaner-ai
doommoses
DoomMoses