12 projects
heliport
Fast and accurate language identifier
bicleaner-ai
Parallel corpus classifier, indicating the likelihood of a pair of sentences being mutual translations or not (neural version)
bicleaner-hardrules
Pre-filtering step for obvious noise based on rules, poor language based on general language modelling and vulgar language based on specific language modelling
monocleaner
Monolingual corpus fluency filter
escape-unk
Escape unknown symbols in SentecePiece vocabularies
fastspell
Targetted language identifier, based on FastText and Hunspell.
fastspell-dictionaries
Hunspell dictionaries for FastSpell
bifixer
None
bicleaner
Parallel corpus classifier, indicating the likelihood of a pair of sentences being mutual translations or not
sacremoses
SacreMoses
bicleaner-ai-glove
glove-python fork for bicleaner-ai
doommoses
DoomMoses