No project description provided
Project description
jagger-python
Python binding for Jagger(C++ implementation of Pattern-based Japanese Morphological Analyzer) : https://www.tkl.iis.u-tokyo.ac.jp/~ynaga/jagger/index.en.html
Install
$ python -m pip install jagger
This does not install model files.
You can download precompiled KWDLC model from https://github.com/lighttransport/jagger-python/releases/download/v0.1.0/model_kwdlc.tar.gz (Note that KWDLC has unclear license/TermOfUse. Use it at your own risk)
Example
import jagger
model_path = "model/kwdlc/patterns"
tokenizer = jagger.Jagger()
tokenizer.load_model(model_path)
text = "吾輩は猫である。名前はまだない。"
toks = tokenizer.tokenize(text)
for tok in toks:
print(tok.surface(), tok.feature())
print("EOL")
"""
吾輩 名詞,普通名詞,*,*,吾輩,わがはい,代表表記:我が輩/わがはい カテゴリ:人
は 助詞,副助詞,*,*,は,は,*
猫 名詞,普通名詞,*,*,猫,ねこ,*
である 判定詞,*,判定詞,デアル列基本形,だ,である,*
。 特殊,句点,*,*,。,。,*
名前 名詞,普通名詞,*,*,名前,なまえ,*
は 助詞,副助詞,*,*,は,は,*
まだ 副詞,*,*,*,まだ,まだ,*
ない 形容詞,*,イ形容詞アウオ段,基本形,ない,ない,*
。 特殊,句点,*,*,。,。,*
"""
# print tags
for tok in toks:
# print tag(split feature() by comma)
print(tok.surface())
for i in range(tok.n_tags()):
print(" tag[{}] = {}".format(i, tok.tag(i)))
print("EOL")
Batch processing(experimental)
tokenize_batch
tokenizes multiple lines(delimited by newline('\n', '\r', or '\r\n')) at once.
Splitting lines is done in C++ side.
import jagger
model_path = "model/kwdlc/patterns"
tokenizer = jagger.Jagger()
tokenizer.load_model(model_path)
text = """
吾輩は猫である。
名前はまだない。
明日の天気は晴れです。
"""
# optional: set C++ threads(CPU cores) to use
# default: Use all CPU cores.
# tokenizer.set_threads(4)
toks_list = tokenizer.tokenize_batch(text)
for toks in toks_list:
for tok in toks:
print(tok.surface(), tok.feature())
Train a model.
Pyhthon interface for training a model is not provided yet.
For a while, you can build C++ trainer cli using CMake(Windows supported).
See train/
for details.
Limitation
Single line string must be less than 262,144 bytes(~= 87,000 UTF-8 Japanese chars).
Jagger version
Jagger version used in this Python binding is
2023-02-18
For developer
Edit dev_mode=True
in to enable asan + debug build
Run python script with
$ LD_PRELOAD=$(gcc -print-file-name=libasan.so) python FILE.py
or
$ LD_PRELOAD=$(clang -print-file-name=libclang_rt.asan-x86_64.so) python FILE.py
TODO
- Provide a model file trained from Wikipedia, UniDic, etc(clearer & permissive licencing&TermOfUse).
- Use GiNZA for morphological analysis.
- Split feature vector(CSV) considering quote char when extracting tags.
- e.g. 'a,b,"c,d",e' => ["a", "b", "c,d", "e"]
License
Python binding is available under 2-clause BSD licence.
Jagger and ccedar_core.h
is licensed under GPLv2/LGPLv2.1/BSD triple licenses.
Third party licences
- stack_container.h: BSD like license.
- nanocsv.h MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for jagger-0.0.0-cp312-cp312-win_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ef86e2c346e52fb7b6eaca4c0122a53d407e6977159616868be6e68154c8007 |
|
MD5 | b1c892698aa39c6ef61af4cd84b5a548 |
|
BLAKE2b-256 | a0313a8fae28a95146abad57198f62116833f74ce4c5baad15f5811e04d79cce |
Hashes for jagger-0.0.0-cp312-cp312-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07c281dd60db1c191ff3bd79fe45eae8a784b85232c94e7745a2237ef65dd8e9 |
|
MD5 | fa5635b0c3025beeb50ef3fc6c23f975 |
|
BLAKE2b-256 | 9375d0062ae37a8771149ec80bba8ef2dc40c4c2dc26ecac7e3dd9a4e9777006 |
Hashes for jagger-0.0.0-cp312-cp312-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d1b5f757a9e730bcc7fdc3b91e046fbd0d65960701b13dc1f38a85b04c146ed |
|
MD5 | b32b6b025023e400bc89e982956a0e86 |
|
BLAKE2b-256 | d4cd88509ef8f464f58686f0f452d34b71542428321e59e376a5b6e1ad268b17 |
Hashes for jagger-0.0.0-cp312-cp312-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 413a670a4821becf3ca2a8bc4a0f2e819219542af08c84707f01fdfd075bcfb2 |
|
MD5 | fe216cb4f8fd0a1ca650157aaa94cf24 |
|
BLAKE2b-256 | 5bc7622b0f2099d11e89639097caf5189a7a150df79162f0d4f61c755d4ca45b |
Hashes for jagger-0.0.0-cp312-cp312-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f933964432d263776455fee7bf8902bccb9f57e82d2c9809ad67d137cf65e64 |
|
MD5 | 71f1743e7d9ca218a41c1f68c1c9991b |
|
BLAKE2b-256 | 2d5d13d16c6cf1f306f0edee9d4a570e9ea7afd4a6c509b293daca2772c2e1b5 |
Hashes for jagger-0.0.0-cp312-cp312-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b64f98e2337f6f28a815ad28495d0989c314edd0589b675c703944ce10162d0 |
|
MD5 | 10e4755ec2830cc16491da925bbbfda6 |
|
BLAKE2b-256 | 183ba8f2808d893723ba62d8ba6fb9e05f0aa23884193ba4a803adb0fce3cc9c |
Hashes for jagger-0.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3cb00e3db4b5bc1fb2d00f338efe59f432171793fa044c501f05aecb3574602 |
|
MD5 | 34be8bdf91683a84e0c60dafd424bd78 |
|
BLAKE2b-256 | b62b4aed5d67bfdb4b03d7ef53fdfc3105752c7f351346c6546466fcf52d3322 |
Hashes for jagger-0.0.0-cp312-cp312-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3545a6d2d37ff57fd17406b4631724ce600390ada11718fd408e73408520c68f |
|
MD5 | 1cf4ddcd5bb1a30e9f949373ef5226b1 |
|
BLAKE2b-256 | 6414bd2b0bbef4853916fc1e5d884fecc78f5cc92f02f8687e4fbbb9fc38b5bb |
Hashes for jagger-0.0.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc29eca7613c89e23bb22a032f0da43741b992e32c2a926a6e37eb9feae16f8a |
|
MD5 | 1f49b34a3a10bf1b06fba7b2157f82cf |
|
BLAKE2b-256 | cc6993acccf6e82bc6b7d09a6f1d8d6a1aa7445ce6a93dcf5540e62c48cfcad4 |
Hashes for jagger-0.0.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 003627469002d31bdd6ee948586dee29af38a35c6183299112249374afb68800 |
|
MD5 | cdd9497796a9ebd86c3180d4395f4191 |
|
BLAKE2b-256 | 0d23571e00727616623a37d529af03946b0d6f4bd64713bfde53066e596ac906 |
Hashes for jagger-0.0.0-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3db7fd159b38472e72604caecf64ecca9b9176a3639d190ab2312b6c8336154 |
|
MD5 | eeb8fa4460cec06b1cdea8247096bed2 |
|
BLAKE2b-256 | 447206b7fae605977742851bc464387e1e9d875d7904f6bf63433c08e40f7b50 |
Hashes for jagger-0.0.0-cp312-cp312-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3c8995c9668c70226254bf9fa47b3271b3ff1621a483d4dc9a4ad5d328e361e |
|
MD5 | cd767fbf6aa73ea4074cf34e01aec344 |
|
BLAKE2b-256 | 8017577dd2ef20dd600556bc59b2a31f0e84195f8c024df5ded565b092d446b2 |
Hashes for jagger-0.0.0-cp311-cp311-win_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f32d767cde886e2256d24833ff8367d69940aab4dfe299b182ee11033757b17a |
|
MD5 | f3375c520e88af53d0f52f997a9efbb9 |
|
BLAKE2b-256 | 963eb01411e1595f12a3726f03bb791099a6a8bc6996511f4356051ba19d888b |
Hashes for jagger-0.0.0-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bdf037d5eade0d1d5769f7f2d29070a1ad4980b2c580632a78f7ee36b4a907e4 |
|
MD5 | 023b430b051d691f992b68fe455c7a6a |
|
BLAKE2b-256 | 05b8dd4b5282e47856cc18479bd8ccf36794cc3cddaf78a9cd378f785fc2116f |
Hashes for jagger-0.0.0-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d078ab60a1358ca2f3b0040b4b31351e7d8f9b35dc79b891a087cef1f5cc6201 |
|
MD5 | 91f3b195427ef2d19b455ee2213f2a64 |
|
BLAKE2b-256 | be8ac10935266e378e590958bb7b3501bcd085b53efd037b31c8455a1bfa6b63 |
Hashes for jagger-0.0.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bdd5e464d49d634c5dacacfae88b0bb27601f634a8f88119782a149461949f43 |
|
MD5 | efde3c5f10990648aca292a35eccfca9 |
|
BLAKE2b-256 | 93884137046d9b3cb503d5ccba266ed9ff9e8e016408747fcc733d5c1c3dc5bf |
Hashes for jagger-0.0.0-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9142b03b96f40a8083a2d07c9c1935ca46ddcc9bdd914e77d8afb753ddc3c6c3 |
|
MD5 | fb49f09cb65a00cccacdabc32bde0fa5 |
|
BLAKE2b-256 | 38749e5279afd6e91c01322f697e5699e365f4a0fafa1616be16e1896513d3e8 |
Hashes for jagger-0.0.0-cp311-cp311-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 096654878717b14a50169264a3cdc497d2b37fd73a6d18178a59f90a7aab0ffe |
|
MD5 | a3babf977c014792bfa7ef8667d99775 |
|
BLAKE2b-256 | 76a8d07724fdc86b2ea894c6e6ca04944f1d217c8a0fcf62e0a5018252dd72cd |
Hashes for jagger-0.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2347698d0d74bef360973210c47d1acad61cdef17671e7cbb4e0da01b1f4a890 |
|
MD5 | a8aed096fcebcdc69a32bd91f891fc39 |
|
BLAKE2b-256 | 89d19a8e23dab63a5aac90c57bd6760265cab8873ace8359c58d75a4e44a262d |
Hashes for jagger-0.0.0-cp311-cp311-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80c9f0fc5bb84d45aa72121e0304db94fc5a4bc6d5ac0bccbe9b133bbe5f1793 |
|
MD5 | 09e8729d4bb40636fd2d132c2ca2d033 |
|
BLAKE2b-256 | 82c28a63c5310cc27f147e61fc3cf70f9cbf3380a551ef61da29a80270c02f98 |
Hashes for jagger-0.0.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc6795ad133056c0ca492ac47798df5b058b8e017485a56f0442ecbd7a489235 |
|
MD5 | 02d926b6de1dd664d5a9a9a49730d98b |
|
BLAKE2b-256 | 996e2339bb21b10c7b39bf037746decaac4dceb13aa919a7d0ff6f0ad80581cd |
Hashes for jagger-0.0.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3f4a514a1b3acf7d84194514cca73678798e0bca46ee3183e6e7276c83ea2ae1 |
|
MD5 | 9997ea306cff3a563e4ac9c8390da7e1 |
|
BLAKE2b-256 | 602c0fbfa29043e10396c25a8be8202adf52d97be54fec719fd2f3210d6da68f |
Hashes for jagger-0.0.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c8a0393aeebcabdcb3d8f2c36e70512eb6c6f4e4d48174828344b582152b400 |
|
MD5 | d9408a46fb90ee3998a8f4c62577cc23 |
|
BLAKE2b-256 | 41e0c162e82ea7fc477cbeafe585a50dd1684b9890a18a777bdeff48132d6035 |
Hashes for jagger-0.0.0-cp311-cp311-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | be6675fea55abb53eae2071fd7420ca5dcc1602b72894b62c98f054683f16d1a |
|
MD5 | cc21bb0b63f734ae5408a446d65fd80e |
|
BLAKE2b-256 | 5e24eea5656113e5d18129c1585d4b3390950777fc2876bf41ff08c1bfc20538 |
Hashes for jagger-0.0.0-cp310-cp310-win_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 285d9f00921a1a22c5d4ebdf417e88b5e129195beba2166383926f48199c587f |
|
MD5 | 07f1e9dea4cc137e628e0d074620b218 |
|
BLAKE2b-256 | ed5aca24aa3b76c3e0ff99051cf96ff9ba5e2ca8ce4ec80dc83c15164062fdec |
Hashes for jagger-0.0.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34f2bb13740dffd193b10594a96cad0244ca5b5ebedac02f98a8700d92aa9536 |
|
MD5 | 419b914b8d0acc1f139e3c140f3aedcf |
|
BLAKE2b-256 | 7e9a20bb43193e8269b02d8ebe608789240f97eeb22aa5e3f2635d8cb9cdca90 |
Hashes for jagger-0.0.0-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c942476c1db1eeb5926aca2950013e38a9a5236aa9f3a85f80fabb15e6e8fc4b |
|
MD5 | 9a76ca99ab3a06a26c5350206dbeb97b |
|
BLAKE2b-256 | b0af627b71d6f481d45073c934b80f0317cf16a3b0a979065817e67a6e5062ae |
Hashes for jagger-0.0.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8edc76b68cfd88d71ba52d50b116e95a1a4742367ba3b52f7d386707656bfa10 |
|
MD5 | e8b29fb864a6f7c8c3297a1785524575 |
|
BLAKE2b-256 | 79bdebbb37f388054a23c6065de3699d8fec324009ddafed7c76012c3efddc30 |
Hashes for jagger-0.0.0-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cdcb8a6789c310d48543da70961f837c76d909c5547c9bca4833e78a714603d3 |
|
MD5 | e9fcc9e1d779f6155c0d6ec31727f31c |
|
BLAKE2b-256 | 624273c66e7eff3f6eac71f0b8bcb349317b656d5fc480a336826b1458519bc8 |
Hashes for jagger-0.0.0-cp310-cp310-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47234b39b603381e7ca566a7bd5d676b13cef417bc36aada80664e0934c45cd6 |
|
MD5 | 0802005f10db81c2023f52434b7a01c6 |
|
BLAKE2b-256 | a81a058c8a5c60d8e141beac8be1184053937f676329ba8a8697833d38a36f05 |
Hashes for jagger-0.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3336ff6ebd13851672856421198066fb9170ce6fdf172fbc5f271ba2489cfc59 |
|
MD5 | c63cf91d8e629b1c8c37815dc55ee118 |
|
BLAKE2b-256 | e497cc211277313609a3b7ac1487e74c6108d9d8fdb9f5302325bf9458bc9253 |
Hashes for jagger-0.0.0-cp310-cp310-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9788e151fa26965353342af2f636d88add2b0083929988af68914dae115db81 |
|
MD5 | 9df9ad7bb321b6c2fdf0242f33ccc288 |
|
BLAKE2b-256 | 92fae1bdc65ee7e42c5c0e2dadf2ccd55173d291b558c7111b6193e2ac9c1c85 |
Hashes for jagger-0.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc4f9717d7622a144ae38fda1843cda6da641a26de44f4ffe954802e9f927557 |
|
MD5 | 88f93a2e16cf0012b3aadb78d843a58f |
|
BLAKE2b-256 | d79a7eeafbae140dc6042979eb05eab0dd1e816409f53f34bc81c8de28b192a4 |
Hashes for jagger-0.0.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92f465feacb496f1dd6c988f10b1e7880e40664367956fa5e36fe5622b77d0a3 |
|
MD5 | bde547f59b9abe0374d1e4cde7ed21ef |
|
BLAKE2b-256 | 552a19adbc9abf911190b5acb2403e4f4d670ff61555ca0f0d96af3cf80b1ff6 |
Hashes for jagger-0.0.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7dc62b2aff7d24a5937eef93ce5ceae1cb913ee0103a8ee992bd8d477097f9a5 |
|
MD5 | b05fd3d07db1e36a7866d43e81567be0 |
|
BLAKE2b-256 | 3431ddb480d49412f9053f83b4647082db3c403f322810eb8833c2d4b9c32945 |
Hashes for jagger-0.0.0-cp310-cp310-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8669cc46b1ea2c23d3f22e6fc8ac8b546f9a2aebcd0e44fa98cdd8ade5a4ed31 |
|
MD5 | 8a2ce3181aba06677cd189810e9cd7e5 |
|
BLAKE2b-256 | a8f4d316037602a7af33694833b264009030729c4a69ea7e570d541d15e383be |