Multilingual POS-tagger and Dependency-parser
Project description
MultiCOMBO
Multilingual POS-Tagger and Dependency-Parser with COMBO-pytorch and spaCy
Basic usage
>>> import multicombo
>>> nlp=multicombo.load()
>>> doc=nlp('Who plays "La vie en rose"?')
>>> print(multicombo.to_conllu(doc))
# text = Who plays "La vie en rose"?
1 Who _ PRON _ PronType=Int 2 nsubj _ Translit=who
2 plays _ VERB _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 root _ _
3 " _ PUNCT _ _ 5 punct _ SpaceAfter=No
4 La _ DET _ Definite=Def|Gender=Fem|Number=Sing|PronType=Art 5 det _ Translit=la
5 vie _ NOUN _ Gender=Fem|Number=Sing 2 obj _ _
6 en _ ADP _ _ 7 case _ _
7 rose _ NOUN _ Number=Sing 5 nmod _ SpaceAfter=No
8 " _ PUNCT _ _ 5 punct _ SpaceAfter=No
9 ? _ PUNCT _ _ 2 punct _ SpaceAfter=No
>>> import deplacy
>>> deplacy.render(doc)
Who PRON <════════════╗ nsubj
plays VERB ═══════════╗═╝═╗ ROOT
" PUNCT <══════╗ ║ ║ punct
La DET <════╗ ║ ║ ║ det
vie NOUN ═══╗═╝═╝═╗<╝ ║ obj
en ADP <╗ ║ ║ ║ case
rose NOUN ═╝<╝ ║ ║ nmod
" PUNCT <════════╝ ║ punct
? PUNCT <══════════════╝ punct
>>> deplacy.serve(doc)
http://127.0.0.1:5000
multicombo.load(lang="xx")
loads spaCy Language pipeline with bert-base-multilingual-cased and spacy.lang.xx.MultiLanguage
tokenizer. Other language specific tokenizers can be loaded with the option lang
, while several languages require additional packages:
lang="ja"
Japanese requires SudachiPy and SudachiDict-core.lang="th"
Thai requires PyThaiNLP.lang="vi"
Vietnamese requires pyvi.
Installation for Linux
pip3 install multicombo --user
Installation for Cygwin64
Make sure to get python37-devel
python37-pip
python37-cython
python37-numpy
python37-cffi
gcc-g++
mingw64-x86_64-gcc-g++
gcc-fortran
git
curl
make
cmake
libopenblas
liblapack-devel
libhdf5-devel
libfreetype-devel
libuv-devel
packages, and then:
curl -L https://raw.githubusercontent.com/KoichiYasuoka/UniDic-COMBO/master/cygwin64.sh | sh
pip3.7 install multicombo
Installation for Jupyter Notebook (Google Colaboratory)
!pip install multicombo
Try notebook for Google Colaboratory.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for multicombo-0.7.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a9ccc6312dfc16aa7b6861635b3359131109165b3b42933f596b682bcf3ac54 |
|
MD5 | fb8df7a9f8d4ac75430ec5e2b027a6ce |
|
BLAKE2b-256 | a812b33a4537b8ad3fbf666c55c029e1e5e012a942a1acc978a1a12f5df827f2 |