Tokenizer POS-tagger and Dependency-parser with BERT/RoBERTa/DeBERTa models for Japanese and other languages
Project description
esupar
Tokenizer, POS-tagger, and dependency-parser with Transformers and SuPar.
Basic usage
>>> import esupar
>>> nlp=esupar.load("ja")
>>> doc=nlp("太郎は花子が読んでいる本を次郎に渡した")
>>> print(doc)
1 太郎 _ PROPN _ _ 12 nsubj _ SpaceAfter=No
2 は _ ADP _ _ 1 case _ SpaceAfter=No
3 花子 _ PROPN _ _ 5 nsubj _ SpaceAfter=No
4 が _ ADP _ _ 3 case _ SpaceAfter=No
5 読ん _ VERB _ _ 8 acl _ SpaceAfter=No
6 で _ SCONJ _ _ 5 mark _ SpaceAfter=No
7 いる _ AUX _ _ 5 aux _ SpaceAfter=No
8 本 _ NOUN _ _ 12 obj _ SpaceAfter=No
9 を _ ADP _ _ 8 case _ SpaceAfter=No
10 次郎 _ PROPN _ _ 12 obl _ SpaceAfter=No
11 に _ ADP _ _ 10 case _ SpaceAfter=No
12 渡し _ VERB _ _ 0 root _ SpaceAfter=No
13 た _ AUX _ _ 12 aux _ _
>>> import deplacy
>>> deplacy.render(doc,Japanese=True)
太郎 PROPN ═╗<════════╗ nsubj(主語)
は ADP <╝ ║ case(格表示)
花子 PROPN ═╗<══╗ ║ nsubj(主語)
が ADP <╝ ║ ║ case(格表示)
読ん VERB ═╗═╗═╝<╗ ║ acl(連体修飾節)
で SCONJ <╝ ║ ║ ║ mark(標識)
いる AUX <══╝ ║ ║ aux(動詞補助成分)
本 NOUN ═╗═════╝<╗ ║ obj(目的語)
を ADP <╝ ║ ║ case(格表示)
次郎 PROPN ═╗<╗ ║ ║ obl(斜格補語)
に ADP <╝ ║ ║ ║ case(格表示)
渡し VERB ═╗═╝═════╝═╝ root(親)
た AUX <╝ aux(動詞補助成分)
esupar.load(model)
loads a natural language processor pipeline, working on Universal Dependencies. Available model
options are:
model="ja"
Japanese model bert-base-japanese-upos (default)model="ja_large"
Japanese model bert-large-japanese-uposmodel="ja_luw_small"
Japanese long-unit-word model roberta-small-japanese-char-luw-uposmodel="ja_luw_base"
Japanese long-unit-word model bert-base-japanese-luw-uposmodel="ja_luw_large"
Japanese long-unit-word model bert-large-japanese-luw-uposmodel="ko"
Korean model roberta-base-korean-uposmodel="ko_large"
Korean model roberta-large-korean-uposmodel="ko_morph_base"
Korean morpheme model roberta-base-korean-morph-uposmodel="ko_morph_large"
Korean morpheme model roberta-large-korean-morph-uposmodel="zh"
Chinese model chinese-bert-wwm-ext-uposmodel="zh_base"
Chinese model chinese-roberta-base-uposmodel="zh_large"
Chinese model chinese-roberta-large-uposmodel="lzh"
Classical Chinese model roberta-classical-chinese-base-uposmodel="lzh_large"
Classical Chinese model roberta-classical-chinese-large-uposmodel="th"
Thai model roberta-base-thai-spm-uposmodel="vi"
Vietnamese model bert-base-vietnamese-uposmodel="en"
English model roberta-base-english-uposmodel="en_large"
English model roberta-large-english-uposmodel="de"
German model bert-base-german-uposmodel="de_large"
German model bert-large-german-uposmodel="sr"
Serbian (Cyrillic and Latin) model gpt2-small-serbian-uposmodel="sr_large"
Serbian (Cyrillic and Latin) model gpt2-large-serbian-uposmodel="cop"
Coptic model roberta-base-coptic-uposmodel="ain"
Ainu model roberta-base-ainu-upos
Installation for Linux
pip3 install esupar --user
Installation for Cygwin64
Make sure to get python37-devel
python37-pip
python37-cython
python37-numpy
python37-wheel
gcc-g++
mingw64-x86_64-gcc-g++
git
curl
make
cmake
, and then:
curl -L https://raw.githubusercontent.com/KoichiYasuoka/CygTorch/master/installer/supar.sh | sh
pip3.7 install esupar
Installation for Google Colaboratory
!pip install esupar
Try notebook.
Author
Koichi Yasuoka (安岡孝一)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
esupar-1.7.6-py3-none-any.whl
(60.1 kB
view details)
File details
Details for the file esupar-1.7.6-py3-none-any.whl
.
File metadata
- Download URL: esupar-1.7.6-py3-none-any.whl
- Upload date:
- Size: 60.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6ccebddb4f405fab45ab51bdcb78fbe5b120321e0b570cbc4d0be164a731e12 |
|
MD5 | 9e00811e7d13994faff1f53b39a8a25f |
|
BLAKE2b-256 | b09d3297612ef595717e5bd459cf27517500f08c16af1d835e24f58de8ee3688 |