Skip to main content

UniDic2UD + COMBO-pytorch wrapper for spaCy

Project description

UniDic-COMBO

UniDic2UD + COMBO-pytorch wrapper for spaCy

Basic Usage

>>> import unidic_combo
>>> nlp=unidic_combo.load("kindai")
>>> doc=nlp("澤山居つた兄弟が一疋も見えぬ")
>>> print(unidic_combo.to_conllu(doc))
# text = 澤山居つた兄弟が一疋も見えぬ
1	澤山	沢山	ADV	副詞	_	2	advmod	_	SpaceAfter=No|Translit=タクサン
2	居つ	居る	VERB	動詞-非自立可能	_	4	acl	_	SpaceAfter=No|Translit=オッ
3			AUX	助動詞	_	2	aux	_	SpaceAfter=No|Translit=
4	兄弟	兄弟	NOUN	名詞-普通名詞-一般	_	9	nsubj	_	SpaceAfter=No|Translit=キョウダイ
5			ADP	助詞-格助詞	_	4	case	_	SpaceAfter=No|Translit=
6			NUM	名詞-数詞	_	7	nummod	_	SpaceAfter=No|Translit=イチ
7			NOUN	接尾辞-名詞的-助数詞	_	9	obl	_	SpaceAfter=No|Translit=ピキ
8			ADP	助詞-係助詞	_	7	case	_	SpaceAfter=No|Translit=
9	見え	見える	VERB	動詞-一般	_	0	root	_	SpaceAfter=No|Translit=ミエ
10			AUX	助動詞	_	9	aux	_	SpaceAfter=No|Translit=

>>> import deplacy
>>> deplacy.render(doc,Japanese=True)
澤山 ADV  <══╗     advmod(連用修飾語)
居つ VERB ═╗═╝<   acl(連体修飾節)
   AUX  <      aux(動詞補助成分)
兄弟 NOUN ═╗═══╝< nsubj(主語)
   ADP  <      case(格表示)
   NUM  <      nummod(数量による修飾語)
   NOUN ═╝═╗<  obl(斜格補語)
   ADP  <══╝   case(格表示)
見え VERB ═╗═══╝═╝ ROOT()
   AUX  <       aux(動詞補助成分)

unidic_combo.load(UniDic) loads spaCy Language pipeline for UniDic2UD + COMBO-pytorch. Available UniDic options are:

Installation for Linux

pip3 install git+https://github.com/KoichiYasuoka/UniDic-COMBO

Installation for Cygwin64

Make sure to get python37-devel python37-pip python37-cython python37-numpy python37-cffi gcc-g++ mingw64-x86_64-gcc-g++ gcc-fortran git curl make cmake libopenblas liblapack-devel libhdf5-devel libfreetype-devel libuv-devel packages, and then:

curl -L https://raw.githubusercontent.com/KoichiYasuoka/UniDic-COMBO/main/cygwin64.sh | sh

Benchmarks

Results of 舞姬/雪國/荒野より-Benchmarks

舞姬 LAS MLAS BLEX
UniDic="kindai" 83.02 74.07 81.48
UniDic="qkana" 81.13 74.07 81.48
UniDic="kinsei" 75.93 69.09 72.73
雪國 LAS MLAS BLEX
UniDic="qkana" 87.50 82.35 78.43
UniDic="kinsei" 85.71 78.43 74.51
UniDic="kindai" 83.19 78.43 74.51
荒野より LAS MLAS BLEX
UniDic="gendai" 81.48 54.05 64.86
UniDic="spoken" 80.42 54.05 64.86
UniDic="kindai" 78.31 53.33 61.33

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unidic_combo-0.7.7.tar.gz (37.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page