Project description
Naeval — comparing quality and performance of NLP systems for Russian language. Naeval is used to evaluate project Natasha components: Razdel, Navec, Slovnet:
Tokenization
See Razdel evalualtion section for more info.
|
corpora |
syntag |
gicrya |
rnc |
|
errors |
time |
errors |
time |
errors |
time |
errors |
time |
re.findall(\w+|\d+|\p+) |
4161 |
0.5 |
2660 |
0.5 |
2277 |
0.4 |
7606 |
0.4 |
spacy |
4388 |
6.2 |
2103 |
5.8 |
1740 |
4.1 |
4057 |
3.9 |
nltk.word_tokenize |
14245 |
3.4 |
60893 |
3.3 |
13496 |
2.7 |
41485 |
2.9 |
mystem |
4514 |
5.0 |
3153 |
4.7 |
2497 |
3.7 |
2028 |
3.9 |
mosestokenizer |
1886 |
2.1 |
1330 |
1.9 |
1796 |
1.6 |
2123 |
1.7 |
segtok.word_tokenize |
2772 |
2.3 |
1288 |
2.3 |
1759 |
1.8 |
1229 |
1.8 |
aatimofeev/spacy_russian_tokenizer |
2930 |
48.7 |
719 |
51.1 |
678 |
39.5 |
2681 |
52.2 |
koziev/rutokenizer |
2627 |
1.1 |
1386 |
1.0 |
2893 |
0.8 |
9411 |
0.9 |
razdel.tokenize |
1510 |
2.9 |
1483 |
2.8 |
322 |
2.0 |
2124 |
2.2 |
Sentence segmentation
|
corpora |
syntag |
gicrya |
rnc |
|
errors |
time |
errors |
time |
errors |
time |
errors |
time |
re.split([.?!…]) |
20456 |
0.9 |
6576 |
0.6 |
10084 |
0.7 |
23356 |
1.0 |
segtok.split_single |
19008 |
17.8 |
4422 |
13.4 |
159738 |
1.1 |
164218 |
2.8 |
mosestokenizer |
41666 |
8.9 |
22082 |
5.7 |
12663 |
6.4 |
50560 |
7.4 |
nltk.sent_tokenize |
16420 |
10.1 |
4350 |
5.3 |
7074 |
5.6 |
32534 |
8.9 |
deeppavlov/rusenttokenize |
10192 |
10.9 |
1210 |
7.9 |
8910 |
6.8 |
21410 |
7.0 |
razdel.sentenize |
9274 |
6.1 |
824 |
3.9 |
11414 |
4.5 |
10594 |
7.5 |
Pretrained embeddings
See Navec evalualtion section for more info.
|
type |
init, s |
get, µs |
disk, mb |
ram, mb |
vocab |
ruscorpora_upos_cbow_300_20_2019 |
w2v |
12.1 |
1.6 |
220.6 |
236.1 |
189K |
ruwikiruscorpora_upos_skipgram_300_2_2019 |
w2v |
15.7 |
1.7 |
290.0 |
309.4 |
248K |
tayga_upos_skipgram_300_2_2019 |
w2v |
15.7 |
1.2 |
290.7 |
310.9 |
249K |
tayga_none_fasttextcbow_300_10_2019 |
fasttext |
11.3 |
14.3 |
2741.9 |
2746.9 |
192K |
araneum_none_fasttextcbow_300_5_2018 |
fasttext |
7.8 |
15.4 |
2752.1 |
2754.7 |
195K |
hudlit_12B_500K_300d_100q |
navec |
1.0 |
19.9 |
50.6 |
95.3 |
500K |
news_1B_250K_300d_100q |
navec |
0.5 |
20.3 |
25.4 |
47.7 |
250K |
|
type |
simlex |
hj |
rt |
ae |
ae2 |
lrwc |
ruscorpora_upos_cbow_300_20_2019 |
w2v |
0.359 |
0.685 |
0.852 |
0.758 |
0.896 |
0.602 |
ruwikiruscorpora_upos_skipgram_300_2_2019 |
w2v |
0.321 |
0.723 |
0.817 |
0.801 |
0.860 |
0.629 |
tayga_upos_skipgram_300_2_2019 |
w2v |
0.429 |
0.749 |
0.871 |
0.771 |
0.899 |
0.639 |
tayga_none_fasttextcbow_300_10_2019 |
fasttext |
0.369 |
0.639 |
0.793 |
0.682 |
0.813 |
0.536 |
araneum_none_fasttextcbow_300_5_2018 |
fasttext |
0.349 |
0.671 |
0.801 |
0.706 |
0.793 |
0.579 |
hudlit_12B_500K_300d_100q |
navec |
0.310 |
0.707 |
0.842 |
0.931 |
0.923 |
0.604 |
news_1B_250K_300d_100q |
navec |
0.230 |
0.590 |
0.784 |
0.866 |
0.861 |
0.589 |
Morphology taggers
|
news |
wiki |
fiction |
social |
poetry |
rupostagger |
0.673 |
0.645 |
0.661 |
0.641 |
0.636 |
rnnmorph |
0.896 |
0.812 |
0.890 |
0.860 |
0.838 |
maru |
0.894 |
0.808 |
0.887 |
0.861 |
0.840 |
udpipe |
0.918 |
0.811 |
0.957 |
0.870 |
0.776 |
spacy |
0.919 |
0.812 |
0.938 |
0.836 |
0.729 |
deeppavlov |
0.940 |
0.841 |
0.944 |
0.870 |
0.857 |
deeppavlov_bert |
0.951 |
0.868 |
0.964 |
0.892 |
0.865 |
|
init, s |
disk, mb |
ram, mb |
speed, it/s |
rupostagger |
4.8 |
3 |
118 |
48.0 |
rnnmorph |
8.7 |
10 |
289 |
16.6 |
maru |
15.8 |
44 |
370 |
36.4 |
udpipe |
6.9 |
45 |
242 |
56.2 |
spacy |
10.9 |
89 |
579 |
30.6 |
deeppavlov |
4.0 |
32 |
10240 |
90.0 (gpu) |
deeppavlov_bert |
20.0 |
1393 |
8704 |
85.0 (gpu) |
Syntax parser
|
news |
wiki |
fiction |
social |
poetry |
|
uas |
las |
uas |
las |
uas |
las |
uas |
las |
uas |
las |
udpipe |
0.873 |
0.823 |
0.622 |
0.531 |
0.910 |
0.876 |
0.700 |
0.624 |
0.625 |
0.534 |
spacy |
0.876 |
0.818 |
0.770 |
0.665 |
0.880 |
0.833 |
0.757 |
0.666 |
0.657 |
0.544 |
deeppavlov_bert |
0.962 |
0.910 |
0.882 |
0.786 |
0.963 |
0.929 |
0.844 |
0.761 |
0.784 |
0.691 |
|
init, s |
disk, mb |
ram, mb |
speed, it/s |
udpipe |
6.9 |
45 |
242 |
56.2 |
spacy |
10.9 |
89 |
579 |
31.6 |
deeppavlov_bert |
34.0 |
1427 |
8704 |
75.0 (gpu) |
NER
See Slovnet evalualtion section for more info.
|
factru |
gareev |
ne5 |
bsnlp |
f1 |
PER |
LOC |
ORG |
PER |
ORG |
PER |
LOC |
ORG |
PER |
LOC |
ORG |
deeppavlov |
0.910 |
0.886 |
0.742 |
0.944 |
0.798 |
0.942 |
0.919 |
0.881 |
0.866 |
0.767 |
0.624 |
deeppavlov_bert |
0.971 |
0.928 |
0.825 |
0.980 |
0.916 |
0.997 |
0.990 |
0.976 |
0.954 |
0.840 |
0.741 |
pullenti |
0.905 |
0.814 |
0.686 |
0.939 |
0.639 |
0.952 |
0.862 |
0.683 |
0.900 |
0.769 |
0.566 |
texterra |
0.900 |
0.800 |
0.597 |
0.888 |
0.561 |
0.901 |
0.777 |
0.594 |
0.858 |
0.783 |
0.548 |
tomita |
0.929 |
|
|
0.921 |
|
0.945 |
|
|
0.881 |
|
|
natasha |
0.867 |
0.753 |
0.297 |
0.873 |
0.347 |
0.852 |
0.709 |
0.394 |
0.836 |
0.755 |
0.350 |
mitie |
0.888 |
0.861 |
0.532 |
0.849 |
0.452 |
0.753 |
0.642 |
0.432 |
0.736 |
0.801 |
0.524 |
|
init, s |
disk, mb |
ram, mb |
speed, articles/s |
deeppavlov |
5.9 |
1024 |
3072 |
24.3 (gpu) |
deeppavlov_bert |
34.5 |
2048 |
6144 |
13.1 (gpu) |
pullenti |
2.9 |
16 |
253 |
6.0 |
texterra |
47.6 |
193 |
3379 |
4.0 |
tomita |
2.0 |
64 |
63 |
29.8 |
natasha |
2.0 |
1 |
160 |
8.8 |
mitie |
28.3 |
327 |
261 |
32.8 |
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution