Tokenizer POS-tagger and Dependency-parser for Classical Chinese

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Text Processing :: Linguistic

Project description

SuPar-Kanbun

Tokenizer, POS-Tagger and Dependency-Parser for Classical Chinese Texts (漢文/文言文) with spaCy, Transformers and SuPar.

Basic usage

>>> import suparkanbun
>>> nlp=suparkanbun.load()
>>> doc=nlp("不入虎穴不得虎子")
>>> print(type(doc))
<class 'spacy.tokens.doc.Doc'>
>>> print(suparkanbun.to_conllu(doc))
# text = 不入虎穴不得虎子
1	不	不	ADV	v,副詞,否定,無界	Polarity=Neg	2	advmod	_	Gloss=not|SpaceAfter=No
2	入	入	VERB	v,動詞,行為,移動	_	0	root	_	Gloss=enter|SpaceAfter=No
3	虎	虎	NOUN	n,名詞,主体,動物	_	4	nmod	_	Gloss=tiger|SpaceAfter=No
4	穴	穴	NOUN	n,名詞,固定物,地形	Case=Loc	2	obj	_	Gloss=cave|SpaceAfter=No
5	不	不	ADV	v,副詞,否定,無界	Polarity=Neg	6	advmod	_	Gloss=not|SpaceAfter=No
6	得	得	VERB	v,動詞,行為,得失	_	2	parataxis	_	Gloss=get|SpaceAfter=No
7	虎	虎	NOUN	n,名詞,主体,動物	_	8	nmod	_	Gloss=tiger|SpaceAfter=No
8	子	子	NOUN	n,名詞,人,関係	_	6	obj	_	Gloss=child|SpaceAfter=No

>>> import deplacy
>>> deplacy.render(doc)
不 ADV  <════╗   advmod
入 VERB ═══╗═╝═╗ ROOT
虎 NOUN <╗ ║   ║ nmod
穴 NOUN ═╝<╝   ║ obj
不 ADV  <════╗ ║ advmod
得 VERB ═══╗═╝<╝ parataxis
虎 NOUN <╗ ║     nmod
子 NOUN ═╝<╝     obj

suparkanbun.load() has two options suparkanbun.load(BERT="roberta-classical-chinese-base-char",Danku=False). With the option Danku=True the pipeline tries to segment sentences automatically. Available BERT options are:

BERT="roberta-classical-chinese-base-char" utilizes roberta-classical-chinese-base-char (default)
BERT="roberta-classical-chinese-large-char" utilizes roberta-classical-chinese-large-char
BERT="guwenbert-base" utilizes GuwenBERT-base
BERT="guwenbert-large" utilizes GuwenBERT-large
BERT="sikubert" utilizes SikuBERT
BERT="sikuroberta" utilizes SikuRoBERTa

Installation for Linux

pip3 install suparkanbun --user

Installation for Cygwin64

Make sure to get python37-devel python37-pip python37-cython python37-numpy python37-wheel gcc-g++ mingw64-x86_64-gcc-g++ git curl make cmake packages, and then:

curl -L https://raw.githubusercontent.com/KoichiYasuoka/CygTorch/master/installer/supar.sh | sh
pip3.7 install suparkanbun

Installation for Jupyter Notebook (Google Colaboratory)

!pip install suparkanbun

Try notebook for Google Colaboratory.

Author

Koichi Yasuoka (安岡孝一)

Reference

Koichi Yasuoka, Christian Wittern, Tomohiko Morioka, Takumi Ikeda, Naoki Yamazaki, Yoshihiro Nikaido, Shingo Suzuki, Shigeki Moro, Kazunori Fujita: Designing Universal Dependencies for Classical Chinese and Its Application, Journal of Information Processing Society of Japan, Vol.63, No.2 (February 2022), pp.355-363.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Text Processing :: Linguistic

Release history Release notifications | RSS feed

This version

1.7.5

Apr 13, 2026

1.7.4

Feb 28, 2026

1.7.3

Feb 23, 2026

1.7.2

Feb 23, 2026

1.7.1

Feb 19, 2026

1.7.0

Feb 9, 2026

1.6.9

Feb 1, 2026

1.6.8

Jan 25, 2026

1.6.7

Jan 18, 2026

1.6.6

Jan 11, 2026

1.6.5

Oct 26, 2025

1.6.4

Oct 13, 2025

1.6.3

Oct 13, 2025

1.6.2

Oct 6, 2025

1.6.1

Sep 18, 2025

1.6.0

Sep 1, 2025

1.5.9

Sep 1, 2025

1.5.8

Aug 31, 2025

1.5.7

Aug 30, 2025

1.5.6

Mar 27, 2025

1.5.5

Mar 27, 2025

1.5.4

Nov 20, 2024

1.5.3

Jun 2, 2024

1.5.2

Feb 29, 2024

1.5.1

Feb 29, 2024

1.5.0

Jul 17, 2023

1.4.9

Jul 17, 2023

1.4.8

Feb 23, 2023

1.4.7

Jan 14, 2023

1.4.6

Sep 16, 2022

1.4.5

Aug 5, 2022

1.4.4

Aug 1, 2022

1.4.3

Mar 14, 2022

1.4.2

Jan 16, 2022

1.4.1

Dec 25, 2021

1.4.0

Dec 24, 2021

1.3.9

Dec 24, 2021

1.3.8

Dec 17, 2021

1.3.7

Dec 17, 2021

1.3.6

Dec 16, 2021

1.3.5

Dec 16, 2021

1.3.4

Dec 16, 2021

1.3.3

Nov 8, 2021

1.3.2

Oct 26, 2021

1.3.1

Oct 16, 2021

1.3.0

Oct 4, 2021

1.2.9

Sep 22, 2021

1.2.8

Sep 21, 2021

1.2.7

Aug 17, 2021

1.2.6

Jul 21, 2021

1.2.5

Jul 9, 2021

1.2.4

Jun 19, 2021

1.2.3

Jun 15, 2021

1.2.2

Jun 15, 2021

1.2.1

May 30, 2021

1.2.0

May 28, 2021

1.1.9

May 24, 2021

1.1.8

May 24, 2021

1.1.7

May 20, 2021

1.1.6

May 13, 2021

1.1.5

May 3, 2021

1.1.4

Apr 29, 2021

1.1.3

Apr 29, 2021

1.1.2

Apr 29, 2021

1.1.1

Apr 27, 2021

1.1.0

Apr 16, 2021

1.0.9

Apr 15, 2021

1.0.8

Apr 15, 2021

1.0.7

Apr 15, 2021

1.0.6

Apr 15, 2021

1.0.5

Apr 8, 2021

1.0.4

Apr 8, 2021

1.0.3

Apr 7, 2021

1.0.2

Apr 7, 2021

1.0.1

Apr 7, 2021

1.0.0

Apr 2, 2021

0.9.9

Mar 26, 2021

0.9.8

Mar 22, 2021

0.9.7

Mar 17, 2021

0.9.6

Mar 13, 2021

0.9.5

Mar 13, 2021

0.9.4

Mar 13, 2021

0.9.3

Mar 12, 2021

0.9.2

Mar 11, 2021

0.9.1

Mar 11, 2021

0.9.0

Mar 11, 2021

0.8.2

Mar 11, 2021

0.8.1

Mar 11, 2021

0.8.0

Mar 11, 2021

0.7.0

Mar 10, 2021

0.6.4

Mar 10, 2021

0.6.3

Mar 10, 2021

0.6.2

Mar 10, 2021

0.6.1

Mar 10, 2021

0.6.0

Mar 10, 2021

0.4.0

Mar 10, 2021

0.3.0

Mar 9, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

suparkanbun-1.7.5-py3-none-any.whl (933.5 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file suparkanbun-1.7.5-py3-none-any.whl.

File metadata

Download URL: suparkanbun-1.7.5-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 933.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.2

File hashes

Hashes for suparkanbun-1.7.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`27cfcd5888ba9158b7bff1c0eb519019060cc752504c5dae48a6435153c00b11`
MD5	`552fa7f88ddaa7f27e45d0afa7167ef8`
BLAKE2b-256	`6eec186dca79dd148e2f4734a9fb8cf0aaab50e1895dad7adab15d85c13de2c5`

See more details on using hashes here.

suparkanbun 1.7.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SuPar-Kanbun

Basic usage

Installation for Linux

Installation for Cygwin64

Installation for Jupyter Notebook (Google Colaboratory)

Author

Reference

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes