This product is bugfix splitter of Plone for Japanese.
Project description
This product is bugfix splitter of Plone for Japanese.
Monkey patching below functions.
Products.CMFPlone.UnicodeSplitter.splitter.bigram
Products.CMFPlone.UnicodeSplitter.splitter.process_unicode
Products.CMFPlone.UnicodeSplitter.splitter.process_unicode_glob
Details
bigram
return [u[i : i + 2] for i in range(len(u) - limit)]
to
if len(u) == 1:
return [u]
else:
return [u[i:i + 2] for i in range(len(u) - limit)]
process_unicode
swords = [g.group() for g in pattern.finditer(word)]
for sword in swords:
if not rx_all.match(sword[0]):
yield sword
else:
yield from bigram(sword, 0)
to
swords = [g.group() for g in pattern.finditer(word)]
for sword in swords:
if not rx_all.match(sword[0]):
yield sword
else:
for x in bigram(sword, 1): # modified
yield x
process_unicode_glob
if i == len(swords) - 1:
limit = 1
else:
limit = 0
to
limit = 1
Installation
Install c2.patch.jasplitter by adding it to your buildout:
[buildout] ... eggs = c2.patch.jasplitter
and then running bin/buildout
Contribute
Issue Tracker: https://bitbucket.org/cmscom/c2.patch.jasplitter/admin/issues
Source Code: https://bitbucket.org/cmscom/c2.patch.jasplitter
Support
If you are having issues, please let us know on the issue tracker.
License
The project is licensed under the GPLv2.
Contributors
Manabu TERADA, terada@cmscom.jp
Changelog
1.0a5 (2023-08-24)
Upload PyPI. [terapyon]
1.0a4 (2023-08-17)
Support Python 3. [terapyon]
1.0a3 (2017-03-9)
Support continual CJK and Ascii words. [terapyon]
Missing packaging for MANIFEST [terapyon]
1.0a2 (2016-11-17)
Package bugfix. [terapyon]
1.0a1 (unreleased)
Initial release. [terapyon]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for c2.patch.jasplitter-1.0a5.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 357d0331b2a019a8ee9d857b673f30e389fa973d4a8c2e7977fdd43cbc66e809 |
|
MD5 | ce4470a148a9a0b2903d4bd733434a14 |
|
BLAKE2b-256 | 0be03ef2062348c66b6e9093d22d589cde284b8953035aa20dd96e8a8c92cec7 |