'BigramSplitter' is add-on search product for Plone 3.x. It supports non-English languages, especially south east Asian languages.
Project description
Introduction
Specification: Text character normalization process uses Python unicodedata. Convert full-width numeric and alphabet character into half-width equivalent. Convert half-width Katakana into full-width equivalent. Therefore all of above character variations can be recognized as same ones.
Language Specifications:
Chinese
No space between words.
There is only Kanji(Chinese) character
Process with Bigram(2-gram) model
Japanese
No space between words
Combination 0f Kanji(Chinese), Katakana, and Hiragana character
Korean
There are spaces between words, but it contains a particle
Combination of Korean alphabet and Kanji(Chinese) character
Discriminate Korean alphabet and Kanji(Chinese) character and processed with Bigram(2-gram) model
Thai
No space between words
It’s very difficult to handle this language in a computer
A vowel and a consonant are registered in Unicode separately so that it is difficult to recognize as one word.
However, there is a possibility of dealing with Thai characters to use Bigram(2-gram) model.
Other languages (Including English)
There is a space between words
It is indexed each word
Notes:
Source Code
Since no documents are available on how to develop ‘word splitter’, we refer to other splitter source code. But I still have a number of questions. If you have any more information, please feel free let us know.
Hotfix to Plone 3.0 source code
Because Plone 3.x catalog setting, catalog.xml, doesn’t have existing index overwrite mechanism, we developed hotfix and added XML attribute. We believe Plone 3 XML define mechanism is simple and clear, so that we take this approach. We appreciate any comment.
Installation
Use zc.buildout
Add Products.BigramSplitter to the list of eggs to install, e.g.:
[buildout] ... eggs = ... Products.BigramSplitter
Tell the plone.recipe.zope2instance recipe to install a ZCML slug:
[instance] recipe = plone.recipe.zope2instance ... zcml = Products.BigramSplitter
Re-run buildout, e.g. with:
$ ./bin/buildout
Restart Zope
Plone setting – Add on products – Quick install
Old Style
Untar downloaded file, then copy to ‘Products’ directory of your Plone instance.
Restart Zope
Plone setting – Add on products – Quick install
Required
Plone3.0.x or higher
License
See docs/LICENSE.txt
1.0 (2010-12-06)
Adding uninstall script
1.0b4 (2010-06-07)
Fixed missing skin folder name
1.0b3 (2010-03-20)
Adding keyword highlight (JavaScript)
1.0a2 (2010-01-29)
Fixed full width space for and search
1.0a1 (2009-12-05)
Initial release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file Products.BigramSplitter-1.0.tar.gz
.
File metadata
- Download URL: Products.BigramSplitter-1.0.tar.gz
- Upload date:
- Size: 22.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83cd6d6323e4771c6f289cf212f9a10dffaec771a2b427d41c13dce83f1771b4 |
|
MD5 | a417dbca63fbb31e198d61e47a6228c8 |
|
BLAKE2b-256 | bfc721741f17ba89dafa0a5ea673e4f69cb35900ffc39fb91d01fbf5930e5634 |
File details
Details for the file Products.BigramSplitter-1.0-py2.4.egg
.
File metadata
- Download URL: Products.BigramSplitter-1.0-py2.4.egg
- Upload date:
- Size: 39.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c144efd5d85b34cb4d90f0e6e9b854b40fa95f154c8ab72aeb02ba460a88d384 |
|
MD5 | ba6109b5e3ba64ec242d20714d5ad52c |
|
BLAKE2b-256 | 0df945e1465769b728cad47aa2ac1e2a6358037cf40bb2e49f065364e2765806 |