Skip to main content

Yet another Python binding for Juman++/KNP

Project description

rhoknp: Yet another Python binding for Juman++/KNP

Test Codecov PyPI PyPI - Python Version Documentation Code style - black

rhoknp is a Python binding for Juman++ and KNP.

import rhoknp

# Perform language analysis by Juman++
jumanpp = rhoknp.Jumanpp()
sentence = jumanpp.apply("電気抵抗率は電気の通しにくさを表す物性値である。")

# Save language analysis by Juman++
with open("result.jumanpp", "wt") as f:
    f.write(sentence.to_jumanpp())

# Load language analysis by Juman++
with open("result.jumanpp", "rt") as f:
    sentence = rhoknp.Sentence.from_jumanpp(f.read())

# Perform language analysis by KNP
knp = rhoknp.KNP()
sentence = knp.apply(sentence)  # or knp.apply("電気抵抗率は...")

# Save language analysis by KNP
with open("result.knp", "wt") as f:
    f.write(sentence.to_knp())

# Load language analysis by KNP
with open("result.knp", "rt") as f:
    sentence = rhoknp.Sentence.from_knp(f.read())

Requirements

Installation

pip install rhoknp

Documentation

https://rhoknp.readthedocs.io/en/latest/

Quick tour

rhoknp provides APIs to perform language analysis by Juman++ and KNP.

# Perform language analysis by Juman++
jumanpp = rhoknp.Jumanpp()
sentence = jumanpp.apply("電気抵抗率は電気の通しにくさを表す物性値である。")

# Perform language analysis by KNP
knp = rhoknp.KNP()
sentence = knp.apply(sentence)  # or knp.apply("電気抵抗率は...")

Sentence objects can be saved in the Juman/KNP format

# Save language analysis by Juman++
with open("result.jumanpp", "wt") as f:
    f.write(sentence.to_jumanpp())

# Save language analysis by KNP
with open("result.knp", "wt") as f:
    f.write(sentence.to_knp())

and recovered from Juman/KNP-format text.

# Load language analysis by Juman++
with open("result.jumanpp", "rt") as f:
    sentence = rhoknp.Sentence.from_jumanpp(f.read())

# Perform language analysis by KNP
with open("result.knp", "rt") as f:
    sentence = rhoknp.Sentence.from_knp(f.read())

It is easy to access the linguistic units that make up a sentence.

for clause in sentence.clauses:
    ...
for phrase in sentence.phrases:  # a.k.a. bunsetsu
    ...
for base_phrase in sentence.base_phrases:  # a.k.a. kihon-ku
    ...
for morpheme in sentence.morphemes:
    ...

rhoknp also provides APIs for document-level language analysis.

document = rhoknp.Document.from_raw_text(
    "電気抵抗率は電気の通しにくさを表す物性値である。単に抵抗率とも呼ばれる。"
)
# If you know sentence boundaries, you can use `Document.from_sentences` instead.
document = rhoknp.Document.from_sentences(
    [
        "電気抵抗率は電気の通しにくさを表す物性値である。",
        "単に抵抗率とも呼ばれる。",
    ]
)

Document objects can be handled in almost the same way as Sentence objects.

# Perform language analysis by Juman++/KNP
document = jumanpp.apply_to_document(document)
document = knp.apply_to_document(document)

# Save language analysis by Juman++/KNP
with open("result.jumanpp", "wt") as f:
    f.write(document.to_jumanpp())
with open("result.knp", "wt") as f:
    f.write(document.to_knp())

# Load language analysis by Juman++/KNP
with open("result.jumanpp", "rt") as f:
    document = rhoknp.Document.from_jumanpp(f.read())
with open("result.knp", "rt") as f:
    document = rhoknp.Document.from_knp(f.read())

# Access language units in the document
for sentence in document.sentences:
    ...
for clause in document.clauses:
    ...
for phrase in document.phrases:
    ...
for base_phrase in document.base_phrases:
    ...
for morpheme in document.morphemes:
    ...

For more information, explore the examples and documentation.

Main differences from pyknp

  • Support document-level language analysis: rhoknp can load and instantiate the result of document-level language analysis: i.e., cohesion analysis and discourse relation analysis.
  • Strictly type-aware: rhoknp is thoroughly annotated with type annotations. Efficient development is possible with the help of an IDE.
  • Extensive test suite: rhoknp is tested with an extensive test suite. See the code coverage at Codecov.
  • Support Python3.8+ only

Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rhoknp-0.4.1.tar.gz (37.6 kB view hashes)

Uploaded Source

Built Distribution

rhoknp-0.4.1-py3-none-any.whl (52.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page