Afrikaans |
af |
OpenSubtitles |
top 1M vectors all vectors model binary |
323K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
17M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Arabic |
ar |
OpenSubtitles |
top 1M vectors all vectors model binary |
188M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
119M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Bulgarian |
bg |
OpenSubtitles |
top 1M vectors all vectors model binary |
246M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
53M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Bengali |
bn |
OpenSubtitles |
top 1M vectors all vectors model binary |
2227K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
18M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Breton |
br |
OpenSubtitles |
top 1M vectors all vectors model binary |
110K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
7644K |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Bosnian |
bs |
OpenSubtitles |
top 1M vectors all vectors model binary |
91M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
13M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Catalan |
ca |
OpenSubtitles |
top 1M vectors all vectors model binary |
3098K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
175M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Czech |
cs |
OpenSubtitles |
top 1M vectors all vectors model binary |
249M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
100M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Danish |
da |
OpenSubtitles |
top 1M vectors all vectors model binary |
87M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
56M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
German |
de |
OpenSubtitles |
top 1M vectors all vectors model binary |
139M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
976M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Greek |
el |
OpenSubtitles |
top 1M vectors all vectors model binary |
271M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
58M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
English |
en |
OpenSubtitles |
top 1M vectors all vectors model binary |
750M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
2477M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Esperanto |
eo |
OpenSubtitles |
top 1M vectors all vectors model binary |
381K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
37M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Spanish |
es |
OpenSubtitles |
top 1M vectors all vectors model binary |
514M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
585M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Estonian |
et |
OpenSubtitles |
top 1M vectors all vectors model binary |
60M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
29M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Basque |
eu |
OpenSubtitles |
top 1M vectors all vectors model binary |
3400K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
20M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Farsi |
fa |
OpenSubtitles |
top 1M vectors all vectors model binary |
45M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
86M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Finnish |
fi |
OpenSubtitles |
top 1M vectors all vectors model binary |
116M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
73M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
French |
fr |
OpenSubtitles |
top 1M vectors all vectors model binary |
335M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
724M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Galician |
gl |
OpenSubtitles |
top 1M vectors all vectors model binary |
1666K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
40M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Hebrew |
he |
OpenSubtitles |
top 1M vectors all vectors model binary |
169M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
132M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Hindi |
hi |
OpenSubtitles |
top 1M vectors all vectors model binary |
695K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
31M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Croatian |
hr |
OpenSubtitles |
top 1M vectors all vectors model binary |
241M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
42M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Hungarian |
hu |
OpenSubtitles |
top 1M vectors all vectors model binary |
227M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
120M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Armenian |
hy |
OpenSubtitles |
top 1M vectors all vectors model binary |
23K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
38M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Indonesian |
id |
OpenSubtitles |
top 1M vectors all vectors model binary |
65M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
69M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Icelandic |
is |
OpenSubtitles |
top 1M vectors all vectors model binary |
7474K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
7196K |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Italian |
it |
OpenSubtitles |
top 1M vectors all vectors model binary |
277M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
476M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Georgian |
ka |
OpenSubtitles |
top 1M vectors all vectors model binary |
1108K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
15M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Kazakh |
kk |
OpenSubtitles |
top 1M vectors all vectors model binary |
13K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
18M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Korean |
ko |
OpenSubtitles |
top 1M vectors all vectors model binary |
6834K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
62M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Lithuanian |
lt |
OpenSubtitles |
top 1M vectors all vectors model binary |
6252K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
23M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Latvian |
lv |
OpenSubtitles |
top 1M vectors all vectors model binary |
2167K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
13M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Macedonian |
mk |
OpenSubtitles |
top 1M vectors all vectors model binary |
20M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
26M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Malayalam |
ml |
OpenSubtitles |
top 1M vectors all vectors model binary |
1520K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
10M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Malay |
ms |
OpenSubtitles |
top 1M vectors all vectors model binary |
12M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
28M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Dutch |
nl |
OpenSubtitles |
top 1M vectors all vectors model binary |
264M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
248M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Norwegian |
no |
OpenSubtitles |
top 1M vectors all vectors model binary |
45M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
90M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Poland |
pl |
OpenSubtitles |
top 1M vectors all vectors model binary |
250M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
232M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Portuguese |
pt |
OpenSubtitles |
top 1M vectors all vectors model binary |
257M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
238M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Romanian |
ro |
OpenSubtitles |
top 1M vectors all vectors model binary |
434M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
65M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Russian |
ru |
OpenSubtitles |
top 1M vectors all vectors model binary |
152M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
390M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Sinhala |
si |
OpenSubtitles |
top 1M vectors all vectors model binary |
3493K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
5980K |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Slovak |
sk |
OpenSubtitles |
top 1M vectors all vectors model binary |
47M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
28M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Slovene |
sl |
OpenSubtitles |
top 1M vectors all vectors model binary |
106M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
31M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Albanian |
sq |
OpenSubtitles |
top 1M vectors all vectors model binary |
11M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
17M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Serbian |
sr |
OpenSubtitles |
top 1M vectors all vectors model binary |
343M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
69M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Swedish |
sv |
OpenSubtitles |
top 1M vectors all vectors model binary |
101M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
143M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Tamil |
ta |
OpenSubtitles |
top 1M vectors all vectors model binary |
123K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
17M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Telugu |
te |
OpenSubtitles |
top 1M vectors all vectors model binary |
103K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
15M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Tagalog |
tl |
OpenSubtitles |
top 1M vectors all vectors model binary |
87K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
6515K |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Turkish |
tr |
OpenSubtitles |
top 1M vectors all vectors model binary |
239M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
54M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Ukrainian |
uk |
OpenSubtitles |
top 1M vectors all vectors model binary |
4945K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
162M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Urdu |
ur |
OpenSubtitles |
top 1M vectors all vectors model binary |
195K |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
15M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|
Vietnamese |
vi |
OpenSubtitles |
top 1M vectors all vectors model binary |
27M |
word counts bigram counts trigram counts |
|
|
Wikipedia |
top 1M vectors all vectors model binary |
115M |
word counts bigram counts trigram counts |
|
|
Wikipedia + OpenSubtitles |
top 1M vectors all vectors model binary |
|
|