Skip to main content

ChaPAS-CaboCha-MeCab wrapper for spaCy

Project description

Current PyPI packages

spaCy-ChaPAS

ChaPAS-CaboCha-MeCab wrapper for spaCy

Basic Usage

>>> import spacy_chapas
>>> nlp=spacy_chapas.load()
>>> doc=nlp("太郎は花子が読んでいる本を次郎に渡した")
>>> for t in doc:
...   print(t.i,t.orth_,t.lemma_,t.pos_,t.tag_,t.head.i,t.dep_,t.norm_,t.ent_iob_,t.ent_type_)
...
0 太郎 太郎 PROPN 名詞-固有名詞-人名- 12 nsubj タロウ B PERSON
1   ADP 助詞-係助詞 0 case  O
2 花子 花子 PROPN 名詞-固有名詞-人名- 4 nsubj ハナコ B PERSON
3   ADP 助詞-格助詞-一般 2 case  O
4 読ん 読む VERB 動詞-自立 7 acl ヨン O
5   CCONJ 助詞-接続助詞 4 mark  O
6 いる いる AUX 動詞-非自立 4 aux イル O
7   NOUN 名詞-一般 12 obj ホン O
8   ADP 助詞-格助詞-一般 7 case  O
9   NOUN 名詞-一般 10 compound ツギ O
10   NOUN 名詞-一般 12 obl ロウ O
11   ADP 助詞-格助詞-一般 10 case  O
12 渡し 渡す VERB 動詞-自立 12 ROOT ワタシ O
13   AUX 助動詞 12 aux  O
>>> import deplacy
>>> deplacy.render(doc,Japanese=True)
太郎 PROPN ═╗<══════════╗ nsubj(主語)
   ADP   <            case(格表示)
花子 PROPN ═╗<          nsubj(主語)
   ADP   <           case(格表示)
読ん VERB  ═══╝═╗═╗<    acl(連体修飾節)
   CCONJ <════╝      mark(標識)
いる AUX   <══════╝     aux(動詞補助成分)
   NOUN  ═╗═══════╝<  obj(目的語)
   ADP   <           case(格表示)
   NOUN  <           compound(複合)
   NOUN  ═╝═╗<       obl(斜格補語)
   ADP   <══╝        case(格表示)
渡し VERB  ═╗═══╝═════╝═╝ ROOT()
   AUX   <             aux(動詞補助成分)
>>> from deplacy.deprelja import deprelja
>>> for b in spacy_chapas.bunsetu_spans(doc):
...   for t in b.lefts:
...     print(spacy_chapas.bunsetu_span(t),"->",b,"("+deprelja[t.dep_]+")")
...
花子が -> 読んでいる (主語)
読んでいる -> 本を (連体修飾節)
太郎は -> 渡した (主語)
本を -> 渡した (目的語)
次郎に -> 渡した (斜格補語)

spacy_chapas.load(UniDic) loads spaCy Language pipeline for ChaPAS-CaboCha-MeCab. Available UniDic options are:

You can simply use chapas2ud on the command line to get Universal Dependencies:

echo 太郎は花子が読んでいる本を次郎に渡した | chapas2ud -I RAW

Installation for Linux (Debian)

First, install MeCab and necessary packages (including oldstable openjdk-8-jre-headless):

sudo apt update
sudo apt install mecab libmecab-dev mecab-ipadic-utf8 python3-pip python3-dev g++ make curl openjdk-8-jre-headless
pip3 install gdown --user
cd /tmp
curl -L 'https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7QVR6VXJ5dWExSTQ' | tar xzf -
cd CRF++-0.58
./configure --prefix=/usr --libdir=`mecab-config --libs-only-L`
make && sudo make install

Second, install CaboCha:

cd /tmp
gdown 'https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7SDd1Q1dUQkZQaUU'
tar xjf cabocha-0.69.tar.bz2
cd cabocha-0.69
./configure --prefix=/usr --libdir=`mecab-config --libs-only-L` --with-charset=UTF8
make && sudo make install

Third, install ChaPAS:

cd /tmp
gdown 'https://drive.google.com/uc?export=download&id=0BwG_CvJHq43fNDlqSkVSREkzaEk'
tar xzf chapas-0.742.tar.gz
sudo mkdir -p /usr/local/bin
sudo mv chapas-0.742 /usr/local/chapas
( echo '#! /bin/sh' ; echo exec `ls -1 /usr/lib/jvm/j*-1.8.*/bin/java | tail -1` -Xmx1g -jar /usr/local/chapas/chapas.jar '"$@"' ) > chapas
sudo install chapas /usr/local/bin

And last, install spaCy-ChaPAS:

pip3 install spacy_chapas --user

Installation for Linux (Ubuntu)

Same as Debian.

Installation for Linux (Kali)

Same as Debian.

Installation for Linux (CentOS)

First, install MeCab and necessary packages:

sudo yum update
sudo yum install python3-pip python3-devel gcc-c++ make curl bzip2 java-1.8.0-openjdk-headless epel-release
sudo rpm -ivh https://packages.groonga.org/centos/latest/groonga-release-latest.noarch.rpm
sudo yum install mecab mecab-devel mecab-ipadic
pip3 install gdown --user
cd /tmp
curl -L 'https://drive.google.com/uc?export=download&id=0B4y35FiV1wh7QVR6VXJ5dWExSTQ' | tar xzf -
cd CRF++-0.58
./configure --prefix=/usr --libdir=`mecab-config --libs-only-L`
make && sudo make install

Second, third, and last are same as Debian.

Installation for Cygwin64

Make sure to get python37-devel python37-pip python37-cython python37-numpy git gcc-g++, and then:

pip3.7 install git+https://github.com/KoichiYasuoka/chapas-cygwin64
pip3.7 install spacy_chapas

Installation for Google Colaboratory

Try notebook.

Benchmarks

Results of 舞姬/雪國/荒野より-Benchmarks

舞姬 LAS MLAS BLEX
UniDic="kindai" 79.25 59.26 62.96
UniDic="qkana" 77.36 59.26 62.96
UniDic="kinsei" 70.37 53.57 53.57
雪國 LAS MLAS BLEX
UniDic="qkana" 87.50 81.63 77.55
UniDic="kinsei" 85.71 77.55 69.39
UniDic="kindai" 83.19 77.55 73.47
荒野より LAS MLAS BLEX
UniDic="kindai" 68.06 35.14 45.95
UniDic="qkana" 64.92 35.14 45.95
UniDic="kinsei" 64.58 32.43 43.24

Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

spacy_chapas-0.9.3-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file spacy_chapas-0.9.3-py3-none-any.whl.

File metadata

File hashes

Hashes for spacy_chapas-0.9.3-py3-none-any.whl
Algorithm Hash digest
SHA256 23a96c9e40fd224b96d1719936be83efbdcfbac9c7a0e5454c474b6ed5a9523d
MD5 c1c0a93f701550f1cf2a42cbf358367d
BLAKE2b-256 9d243bd70641f2a44d9ce449eb6ba60d1551196a93eb8a44b9b3296f4c2d7f43

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page