12 projects
bicleaner-ai
Parallel corpus classifier, indicating the likelihood of a pair of sentences being mutual translations or not (neural version)
bicleaner-hardrules
Pre-filtering step for obvious noise based on rules, poor language based on general language modelling and vulgar language based on specific language modelling
monocleaner
Monolingual corpus fluency filter
heliport
Fast and accurate language identifier
escape-unk
Escape unknown symbols in SentecePiece vocabularies
fastspell
Targetted language identifier, based on FastText and Hunspell.
fastspell-dictionaries
Hunspell dictionaries for FastSpell
bifixer
None
bicleaner
Parallel corpus classifier, indicating the likelihood of a pair of sentences being mutual translations or not
sacremoses
SacreMoses
bicleaner-ai-glove
glove-python fork for bicleaner-ai
doommoses
DoomMoses