Tokenizes Japanese documents to enable CRUD operations.
Project description
Jadoc: Tokenizes Japanese Documents to Enable CRUD Operations
Installation
Install MeCab
MeCab is required for Jadoc to work. If it is not already installed, install MeCab first.
Install Jadoc
$ pip install jadoc
Examples
from jadoc.doc import Doc
doc = Doc("本を書きました。")
# print surface forms of the tokens.
surfaces = [word.surface for word in doc.words]
print("/".join(surfaces)) # 本/を/書き/まし/た/。
# print plain text
print(doc.get_text()) # 本を書きました。
# delete a word
doc.delete(3) # Word conjugation will be done as needed.
print(doc.get_text()) # 本を書いた。
# update a word
word = doc.conjugation.tokenize("読む")
# In addition to conjugation, transform the peripheral words as needed.
doc.update(2, word)
print(doc.get_text()) # 本を読んだ。
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
jadoc-0.2.5.tar.gz
(16.6 kB
view details)
Built Distribution
jadoc-0.2.5-py3-none-any.whl
(19.0 kB
view details)
File details
Details for the file jadoc-0.2.5.tar.gz
.
File metadata
- Download URL: jadoc-0.2.5.tar.gz
- Upload date:
- Size: 16.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.4 CPython/3.9.1 Linux/4.18.0-240.1.1.el8_3.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e627fcf9ad00c5b179bb885fd845168f96b0c9945a006d5d466694d14ce1417 |
|
MD5 | 778e2a344cc67a273f322bc650bf52fa |
|
BLAKE2b-256 | d45c93a0656a94df309bdfc05e30d45923db2225921a45869b38a7b3b0d91ba5 |
File details
Details for the file jadoc-0.2.5-py3-none-any.whl
.
File metadata
- Download URL: jadoc-0.2.5-py3-none-any.whl
- Upload date:
- Size: 19.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.4 CPython/3.9.1 Linux/4.18.0-240.1.1.el8_3.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85c7ffd8652eeb96089e204bbafd4b383fc068e49270c4880331e2c32a54d7ef |
|
MD5 | 5c9b176b13c8af00775f14b05702aec1 |
|
BLAKE2b-256 | 52dd7c32cb55247b1c3e6747943910ed68cc177b7ab59de8015ca364b324b58b |