Skip to main content
Join the official 2020 Python Developers SurveyStart the survey!

SynCha-CaboCha-MeCab wrapper for spaCy

Project description

Current PyPI packages


SynCha-CaboCha-MeCab wrapper for spaCy

Basic Usage

>>> import spacy_syncha
>>> nlp=spacy_syncha.load()
>>> doc=nlp("太郎は花子が読んでいる本を次郎に渡した")
>>> for t in doc:
...   print(t.i,t.orth_,t.lemma_,t.pos_,t.tag_,t.head.i,t.dep_,t.norm_,t.ent_iob_,t.ent_type_)
0 太郎 太郎 PROPN 名詞-固有名詞-人名- 12 nsubj タロウ B PERSON
1   ADP 助詞-係助詞 0 case  O
2 花子 花子 PROPN 名詞-固有名詞-人名- 4 nsubj ハナコ B PERSON
3   ADP 助詞-格助詞-一般 2 case  O
4 読ん 読む VERB 動詞-自立 7 acl ヨン O
5   CCONJ 助詞-接続助詞 4 mark  O
6 いる いる AUX 動詞-非自立 4 aux イル O
7   NOUN 名詞-一般 12 obj ホン O
8   ADP 助詞-格助詞-一般 7 case  O
9   NOUN 名詞-一般 10 compound ツギ O
10   NOUN 名詞-一般 12 iobj ロウ O
11   ADP 助詞-格助詞-一般 10 case  O
12 渡し 渡す VERB 動詞-自立 12 ROOT ワタシ O
13   AUX 助動詞 12 aux  O
>>> import deplacy
>>> deplacy.render(doc,Japanese=True)
太郎 PROPN ═╗<══════════╗ nsubj(主語)
   ADP   <            case(格表示)
花子 PROPN ═╗<          nsubj(主語)
   ADP   <           case(格表示)
読ん VERB  ═══╝═╗═╗<    acl(連体修飾節)
   CCONJ <════╝      mark(標識)
いる AUX   <══════╝     aux(動詞補助成分)
   NOUN  ═╗═══════╝<  obj(目的語)
   ADP   <           case(格表示)
   NOUN  <           compound(複合)
   NOUN  ═╝═╗<       iobj(間接目的語)
   ADP   <══╝        case(格表示)
渡し VERB  ═╗═══╝═════╝═╝ ROOT()
   AUX   <             aux(動詞補助成分)
>>> for b in spacy_syncha.bunsetu_spans(doc):
...   for t in b.lefts:
...     print(spacy_syncha.bunsetu_span(t),"->",b)
花子が -> 読んでいる
読んでいる -> 本を
太郎は -> 渡した
本を -> 渡した
次郎に -> 渡した

spacy_syncha.load(UniDic) loads spaCy Language pipeline for SynCha-CaboCha-MeCab. Available UniDic options are:

You can simply use syncha2ud on the command line to get Universal Dependencies:

echo 太郎は花子が読んでいる本を次郎に渡した | syncha2ud

Installation for Linux (Debian)

First, install MeCab and necessary packages:

sudo apt update
sudo apt install mecab libmecab-dev mecab-ipadic-utf8 python3-pip python3-dev g++ make curl lp-solve
cd /tmp
curl -L '' | tar xzf -
cd CRF++-0.58
./configure --prefix=/usr --libdir=`mecab-config --libs-only-L`
make && sudo make install

Second, install CaboCha:

cd /tmp
curl -sc cabocha.cookie '' > /dev/null
curl -Lb cabocha.cookie ''`tr -d '\015' < cabocha.cookie | awk '/_warning_/{print $NF}'` | tar xjf -
cd cabocha-0.69
./configure --prefix=/usr --libdir=`mecab-config --libs-only-L` --with-charset=UTF8
make && sudo make install

Third, install SynCha:

cd /tmp
curl -L '' | tar xzf -
sudo mkdir -p /usr/local/bin
sudo mv syncha- /usr/local/syncha
( echo '#! /bin/sh' ; echo 'exec /usr/local/syncha/syncha "$@"' ) > syncha
sudo install syncha /usr/local/bin

And last, install spaCy-SynCha:

pip3 install spacy_syncha --user

Installation for Linux (Ubuntu)

Same as Debian.

Installation for Linux (Kali)

Same as Debian.

Installation for Linux (CentOS)

First, install MeCab and necessary packages:

sudo yum update
sudo yum install python3-pip python3-devel gcc-c++ make curl bzip2 lpsolve epel-release
sudo rpm -ivh
sudo yum install mecab mecab-devel mecab-ipadic
cd /tmp
curl -L '' | tar xzf -
cd CRF++-0.58
./configure --prefix=/usr --libdir=`mecab-config --libs-only-L`
make && sudo make install

Second, third, and last are same as Debian.

Installation for Cygwin

Make sure to get python37-devel python37-pip python37-cython python37-numpy git gcc-g++ perl, and then:

pip3.7 install git+
pip3.7 install spacy_syncha --no-build-isolation

Installation for Google Colaboratory

Try notebook.


Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for spacy-syncha, version 0.6.8
Filename, size File type Python version Upload date Hashes
Filename, size spacy_syncha-0.6.8-py3-none-any.whl (14.4 kB) File type Wheel Python version py3 Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page