Skip to main content

UniDic2UD + COMBO-pytorch wrapper for spaCy

Project description

Current PyPI packages

UniDic-COMBO

UniDic2UD + COMBO-pytorch wrapper for spaCy

Basic Usage

>>> import unidic_combo
>>> nlp=unidic_combo.load("kindai")
>>> doc=nlp("澤山居つた兄弟が一疋も見えぬ")
>>> print(unidic_combo.to_conllu(doc))
# text = 澤山居つた兄弟が一疋も見えぬ
1	澤山	沢山	ADV	副詞	_	2	advmod	_	SpaceAfter=No|Translit=タクサン
2	居つ	居る	VERB	動詞-非自立可能	_	4	acl	_	SpaceAfter=No|Translit=オッ
3			AUX	助動詞	_	2	aux	_	SpaceAfter=No|Translit=
4	兄弟	兄弟	NOUN	名詞-普通名詞-一般	_	9	nsubj	_	SpaceAfter=No|Translit=キョウダイ
5			ADP	助詞-格助詞	_	4	case	_	SpaceAfter=No|Translit=
6			NUM	名詞-数詞	_	7	nummod	_	SpaceAfter=No|Translit=イチ
7			NOUN	接尾辞-名詞的-助数詞	_	9	obl	_	SpaceAfter=No|Translit=ピキ
8			ADP	助詞-係助詞	_	7	case	_	SpaceAfter=No|Translit=
9	見え	見える	VERB	動詞-一般	_	0	root	_	SpaceAfter=No|Translit=ミエ
10			AUX	助動詞	_	9	aux	_	SpaceAfter=No|Translit=

>>> import deplacy
>>> deplacy.render(doc,Japanese=True)
澤山 ADV  <══╗     advmod(連用修飾語)
居つ VERB ═╗═╝<   acl(連体修飾節)
   AUX  <      aux(動詞補助成分)
兄弟 NOUN ═╗═══╝< nsubj(主語)
   ADP  <      case(格表示)
   NUM  <      nummod(数量による修飾語)
   NOUN ═╝═╗<  obl(斜格補語)
   ADP  <══╝   case(格表示)
見え VERB ═╗═══╝═╝ ROOT()
   AUX  <       aux(動詞補助成分)

>>> from deplacy.deprelja import deprelja
>>> for b in unidic_combo.bunsetu_spans(doc):
...   for t in b.lefts:
...     print(unidic_combo.bunsetu_span(t),"->",b,"("+deprelja[t.dep_]+")")
...
澤山 -> 居つた (連用修飾語)
居つた -> 兄弟が (連体修飾節)
兄弟が -> 見えぬ (主語)
一疋も -> 見えぬ (斜格補語)

unidic_combo.load(UniDic,BERT=True) loads spaCy Language pipeline for UniDic2UD + COMBO-pytorch. Available UniDic options are:

BERT=True/BERT=False option enables/disables to use bert-base-japanese-whole-word-masking.

Installation for Linux

pip3 install unidic_combo

Installation for Cygwin64

Make sure to get python37-devel python37-pip python37-cython python37-numpy python37-cffi gcc-g++ mingw64-x86_64-gcc-g++ gcc-fortran git curl make cmake libopenblas liblapack-devel libhdf5-devel libfreetype-devel libuv-devel packages, and then:

curl -L https://raw.githubusercontent.com/KoichiYasuoka/UniDic-COMBO/master/cygwin64.sh | sh

Installation for macOS

g++ --version
pip3 install unidic_combo --user
python3 -m spacy download en_core_web_sm --user

If you fail to install Jsonnet, try below before installing UniDic-COMBO:

( echo '#! /bin/sh' ; echo 'exec gcc `echo $* | sed "s/-arch [^ ]*//g"`' ) > /tmp/clang
chmod 755 /tmp/clang
env PATH="/tmp:$PATH" pip3 install jsonnet --user

If you fail to install fugashi, try to install MeCab before installing UniDic-COMBO:

cd /tmp
git clone --depth=1 https://github.com/taku910/mecab
cd mecab/mecab
./configure --with-charset=UTF8
make && sudo make install

Benchmarks

Results of 舞姬/雪國/荒野より-Benchmarks

舞姬 LAS MLAS BLEX
UniDic="kindai" 84.91 77.78 85.19
UniDic="qkana" 83.02 77.78 85.19
UniDic="kinsei" 75.93 67.86 71.43
雪國 LAS MLAS BLEX
UniDic="qkana" 87.50 82.35 78.43
UniDic="kindai" 83.19 78.43 74.51
UniDic="kinsei" 78.57 73.08 69.23
荒野より LAS MLAS BLEX
UniDic="kindai" 78.53 59.46 59.46
UniDic="qkana" 77.49 59.46 59.46
UniDic="kinsei" 76.04 59.46 59.46

Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unidic_combo-1.2.4-py3-none-any.whl (72.2 kB view details)

Uploaded Python 3

File details

Details for the file unidic_combo-1.2.4-py3-none-any.whl.

File metadata

  • Download URL: unidic_combo-1.2.4-py3-none-any.whl
  • Upload date:
  • Size: 72.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.2

File hashes

Hashes for unidic_combo-1.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 28effa9738abc57a86de731c78649a1a7cb2f980617d3c905703bba32653a946
MD5 c355b8701012a4c10ca6ca0f8f2cf392
BLAKE2b-256 0a4f67ae4ed53cd6a5d73cdc4be0cc0e25862a52b2634c503ad48589096f5ae1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page