Project description
Naeval — comparing quality and performance of NLP systems for Russian language. Naeval is used to evaluate project Natasha components: Razdel , Navec , Slovnet :
Tokenization
See Razdel evalualtion section for more info.
corpora
syntag
gicrya
rnc
errors
time
errors
time
errors
time
errors
time
re.findall(\w+|\d+|\p+)
4161
0.5
2660
0.5
2277
0.4
7606
0.4
spacy
4388
6.2
2103
5.8
1740
4.1
4057
3.9
nltk.word_tokenize
14245
3.4
60893
3.3
13496
2.7
41485
2.9
mystem
4514
5.0
3153
4.7
2497
3.7
2028
3.9
mosestokenizer
1886
2.1
1330
1.9
1796
1.6
2123
1.7
segtok.word_tokenize
2772
2.3
1288
2.3
1759
1.8
1229
1.8
aatimofeev/spacy_russian_tokenizer
2930
48.7
719
51.1
678
39.5
2681
52.2
koziev/rutokenizer
2627
1.1
1386
1.0
2893
0.8
9411
0.9
razdel.tokenize
1510
2.9
1483
2.8
322
2.0
2124
2.2
Sentence segmentation
corpora
syntag
gicrya
rnc
errors
time
errors
time
errors
time
errors
time
re.split([.?!…])
20456
0.9
6576
0.6
10084
0.7
23356
1.0
segtok.split_single
19008
17.8
4422
13.4
159738
1.1
164218
2.8
mosestokenizer
41666
8.9
22082
5.7
12663
6.4
50560
7.4
nltk.sent_tokenize
16420
10.1
4350
5.3
7074
5.6
32534
8.9
deeppavlov/rusenttokenize
10192
10.9
1210
7.9
8910
6.8
21410
7.0
razdel.sentenize
9274
6.1
824
3.9
11414
4.5
10594
7.5
Pretrained embeddings
See Navec evalualtion section for more info.
type
init, s
get, µs
disk, mb
ram, mb
vocab
ruscorpora_upos_cbow_300_20_2019
w2v
12.1
1.6
220.6
236.1
189K
ruwikiruscorpora_upos_skipgram_300_2_2019
w2v
15.7
1.7
290.0
309.4
248K
tayga_upos_skipgram_300_2_2019
w2v
15.7
1.2
290.7
310.9
249K
tayga_none_fasttextcbow_300_10_2019
fasttext
11.3
14.3
2741.9
2746.9
192K
araneum_none_fasttextcbow_300_5_2018
fasttext
7.8
15.4
2752.1
2754.7
195K
hudlit_12B_500K_300d_100q
navec
1.0
19.9
50.6
95.3
500K
news_1B_250K_300d_100q
navec
0.5
20.3
25.4
47.7
250K
type
simlex
hj
rt
ae
ae2
lrwc
ruscorpora_upos_cbow_300_20_2019
w2v
0.359
0.685
0.852
0.758
0.896
0.602
ruwikiruscorpora_upos_skipgram_300_2_2019
w2v
0.321
0.723
0.817
0.801
0.860
0.629
tayga_upos_skipgram_300_2_2019
w2v
0.429
0.749
0.871
0.771
0.899
0.639
tayga_none_fasttextcbow_300_10_2019
fasttext
0.369
0.639
0.793
0.682
0.813
0.536
araneum_none_fasttextcbow_300_5_2018
fasttext
0.349
0.671
0.801
0.706
0.793
0.579
hudlit_12B_500K_300d_100q
navec
0.310
0.707
0.842
0.931
0.923
0.604
news_1B_250K_300d_100q
navec
0.230
0.590
0.784
0.866
0.861
0.589
Morphology taggers
news
wiki
fiction
social
poetry
rupostagger
0.673
0.645
0.661
0.641
0.636
rnnmorph
0.896
0.812
0.890
0.860
0.838
maru
0.894
0.808
0.887
0.861
0.840
udpipe
0.918
0.811
0.957
0.870
0.776
spacy
0.919
0.812
0.938
0.836
0.729
deeppavlov
0.940
0.841
0.944
0.870
0.857
deeppavlov_bert
0.951
0.868
0.964
0.892
0.865
init, s
disk, mb
ram, mb
speed, it/s
rupostagger
4.8
3
118
48.0
rnnmorph
8.7
10
289
16.6
maru
15.8
44
370
36.4
udpipe
6.9
45
242
56.2
spacy
10.9
89
579
30.6
deeppavlov
4.0
32
10240
90.0 (gpu)
deeppavlov_bert
20.0
1393
8704
85.0 (gpu)
Syntax parser
news
wiki
fiction
social
poetry
uas
las
uas
las
uas
las
uas
las
uas
las
udpipe
0.873
0.823
0.622
0.531
0.910
0.876
0.700
0.624
0.625
0.534
spacy
0.876
0.818
0.770
0.665
0.880
0.833
0.757
0.666
0.657
0.544
deeppavlov_bert
0.962
0.910
0.882
0.786
0.963
0.929
0.844
0.761
0.784
0.691
init, s
disk, mb
ram, mb
speed, it/s
udpipe
6.9
45
242
56.2
spacy
10.9
89
579
31.6
deeppavlov_bert
34.0
1427
8704
75.0 (gpu)
NER
See Slovnet evalualtion section for more info.
factru
gareev
ne5
bsnlp
f1
PER
LOC
ORG
PER
ORG
PER
LOC
ORG
PER
LOC
ORG
deeppavlov
0.910
0.886
0.742
0.944
0.798
0.942
0.919
0.881
0.866
0.767
0.624
deeppavlov_bert
0.971
0.928
0.825
0.980
0.916
0.997
0.990
0.976
0.954
0.840
0.741
pullenti
0.905
0.814
0.686
0.939
0.639
0.952
0.862
0.683
0.900
0.769
0.566
texterra
0.900
0.800
0.597
0.888
0.561
0.901
0.777
0.594
0.858
0.783
0.548
tomita
0.929
0.921
0.945
0.881
natasha
0.867
0.753
0.297
0.873
0.347
0.852
0.709
0.394
0.836
0.755
0.350
mitie
0.888
0.861
0.532
0.849
0.452
0.753
0.642
0.432
0.736
0.801
0.524
init, s
disk, mb
ram, mb
speed, articles/s
deeppavlov
5.9
1024
3072
24.3 (gpu)
deeppavlov_bert
34.5
2048
6144
13.1 (gpu)
pullenti
2.9
16
253
6.0
texterra
47.6
193
3379
4.0
tomita
2.0
64
63
29.8
natasha
2.0
1
160
8.8
mitie
28.3
327
261
32.8
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages .
Source Distributions
Built Distribution
File details
Details for the file naeval-0.2.0-py3-none-any.whl
.
File metadata
Download URL:
naeval-0.2.0-py3-none-any.whl
Upload date:
Mar 26, 2020
Size: 52.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.6.9
File hashes
Hashes for naeval-0.2.0-py3-none-any.whl
Algorithm
Hash digest
SHA256
954fd910d32fa537a799348478d0e4908f8a81182b9e947d5aee325343e57f54
Copy
MD5
cef8392492d312c5c7859cf08ec5c912
Copy
BLAKE2b-256
2c6624353ca603f862e89bbf4c9def02be8e73a12cb8d324c6c93e8b277687b5
Copy
See more details on using hashes here.