Yet another Python binding for Juman++/KNP
Project description
rhoknp: Yet another Python binding for Juman++/KNP
rhoknp is a Python binding for Juman++ and KNP.
import rhoknp
# Perform language analysis by Juman++
jumanpp = rhoknp.Jumanpp()
sentence = jumanpp.apply("電気抵抗率は電気の通しにくさを表す物性値である。")
# Save language analysis by Juman++
with open("result.jumanpp", "wt") as f:
f.write(sentence.to_jumanpp())
# Load language analysis by Juman++
with open("result.jumanpp", "rt") as f:
sentence = rhoknp.Sentence.from_jumanpp(f.read())
# Perform language analysis by KNP
knp = rhoknp.KNP()
sentence = knp.apply(sentence) # or knp.apply("電気抵抗率は...")
# Save language analysis by KNP
with open("result.knp", "wt") as f:
f.write(sentence.to_knp())
# Load language analysis by KNP
with open("result.knp", "rt") as f:
sentence = rhoknp.Sentence.from_knp(f.read())
Requirements
Installation
pip install rhoknp
Documentation
https://rhoknp.readthedocs.io/en/latest/
Quick tour
rhoknp provides APIs to perform language analysis by Juman++ and KNP.
# Perform language analysis by Juman++
jumanpp = rhoknp.Jumanpp()
sentence = jumanpp.apply("電気抵抗率は電気の通しにくさを表す物性値である。")
# Perform language analysis by KNP
knp = rhoknp.KNP()
sentence = knp.apply(sentence) # or knp.apply("電気抵抗率は...")
Sentence objects can be saved in the Juman/KNP format
# Save language analysis by Juman++
with open("result.jumanpp", "wt") as f:
f.write(sentence.to_jumanpp())
# Save language analysis by KNP
with open("result.knp", "wt") as f:
f.write(sentence.to_knp())
and recovered from Juman/KNP-format text.
# Load language analysis by Juman++
with open("result.jumanpp", "rt") as f:
sentence = rhoknp.Sentence.from_jumanpp(f.read())
# Perform language analysis by KNP
with open("result.knp", "rt") as f:
sentence = rhoknp.Sentence.from_knp(f.read())
It is easy to access the linguistic units that make up a sentence.
for clause in sentence.clauses:
...
for phrase in sentence.phrases: # a.k.a. bunsetsu
...
for base_phrase in sentence.base_phrases: # a.k.a. kihon-ku
...
for morpheme in sentence.morphemes:
...
rhoknp also provides APIs for document-level language analysis.
document = rhoknp.Document.from_raw_text(
"電気抵抗率は電気の通しにくさを表す物性値である。単に抵抗率とも呼ばれる。"
)
# If you know sentence boundaries, you can use `Document.from_sentences` instead.
document = rhoknp.Document.from_sentences(
[
"電気抵抗率は電気の通しにくさを表す物性値である。",
"単に抵抗率とも呼ばれる。",
]
)
Document objects can be handled in almost the same way as Sentence objects.
# Perform language analysis by Juman++/KNP
document = jumanpp.apply_to_document(document)
document = knp.apply_to_document(document)
# Save language analysis by Juman++/KNP
with open("result.jumanpp", "wt") as f:
f.write(document.to_jumanpp())
with open("result.knp", "wt") as f:
f.write(document.to_knp())
# Load language analysis by Juman++/KNP
with open("result.jumanpp", "rt") as f:
document = rhoknp.Document.from_jumanpp(f.read())
with open("result.knp", "rt") as f:
document = rhoknp.Document.from_knp(f.read())
# Access language units in the document
for sentence in document.sentences:
...
for clause in document.clauses:
...
for phrase in document.phrases:
...
for base_phrase in document.base_phrases:
...
for morpheme in document.morphemes:
...
For more information, explore the examples and documentation.
Main differences from pyknp
- Support document-level language analysis: rhoknp can load and instantiate the result of document-level language analysis: i.e., cohesion analysis and discourse relation analysis.
- Strictly type-aware: rhoknp is thoroughly annotated with type annotations. Efficient development is possible with the help of an IDE.
- Extensive test suite: rhoknp is tested with an extensive test suite. See the code coverage at Codecov.
- Support Python3.8+ only
Reference
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
rhoknp-0.4.0.tar.gz
(36.0 kB
view hashes)
Built Distribution
rhoknp-0.4.0-py3-none-any.whl
(49.8 kB
view hashes)