Tokenizes Japanese documents to enable CRUD operations.
Project description
Jadoc: Tokenizes Japanese Documents to Enable CRUD Operations
Installation
Install MeCab
MeCab is required for Jadoc to work. If it is not already installed, install MeCab first.
Install Jadoc
$ pip install jadoc
Examples
from youcab import youcab
from jadoc.conj import Conjugation
from jadoc.doc import Doc
tokenize = youcab.generate_tokenizer()
conjugation = Conjugation(tokenize)
doc = Doc("本を書きました。", conjugation)
# print surface forms of the tokens.
surfaces = [word.surface for word in doc.words]
print("/".join(surfaces)) # 本/を/書き/まし/た/。
# print plain text
print(doc.text()) # 本を書きました。
# delete a word
doc.delete(3) # Word conjugation will be done as needed.
print(doc.text()) # 本を書いた。
# update a word
word = tokenize("読む")
doc.update(2, word) # In addition to conjugation, transform the peripheral words as needed.
print(doc.text()) # 本を読んだ。
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
jadoc-0.1.1.tar.gz
(13.6 kB
view hashes)
Built Distribution
jadoc-0.1.1-py3-none-any.whl
(14.8 kB
view hashes)