Skip to main content

'BigramSplitter' is add-on search product for Plone 3.x. It supports non-English languages, especially south east Asian languages.

Project description

Introduction

Description: ‘BigramSplitter’ is add-on search product for Plone 3.x. It supports non-English languages, especially south east Asian languages.

Specification: Text character normalization process uses Python unicodedata. Convert full-width numeric and alphabet character into half-width equivalent. Convert half-width Katakana into full-width equivalent. Therefore all of above character variations can be recognized as same ones.

Language Specifications:

  • Chinese

  • No space between words.

  • There is only Kanji(Chinese) character

  • Process with Bigram(2-gram) model

  • Japanese

  • No space between words

  • Combination 0f Kanji(Chinese), Katakana, and Hiragana character

  • Korean

  • There are spaces between words, but it contains a particle

  • Combination of Korean alphabet and Kanji(Chinese) character

  • Discriminate Korean alphabet and Kanji(Chinese) character and processed with Bigram(2-gram) model

  • Thai

  • No space between words

  • It’s very difficult to handle this language in a computer

  • A vowel and a consonant are registered in Unicode separately so that it is difficult to recognize as one word.

  • However, there is a possibility of dealing with Thai characters to use Bigram(2-gram) model.

  • Other languages (Including English)

  • There is a space between words

  • It is indexed each word

Notes:

  • Source Code

    Since no documents are available on how to develop ‘word splitter’, we refer to other splitter source code. But I still have a number of questions. If you have any more information, please feel free let us know.

  • Hotfix to Plone 3.0 source code

    Because Plone 3.x catalog setting, catalog.xml, doesn’t have existing index overwrite mechanism, we developed hotfix and added XML attribute. We believe Plone 3 XML define mechanism is simple and clear, so that we take this approach. We appreciate any comment.

Installation

Use zc.buildout

  • Add Products.BigramSplitter to the list of eggs to install, e.g.:

    [buildout]
    ...
    eggs =
        ...
        Products.BigramSplitter
  • Tell the plone.recipe.zope2instance recipe to install a ZCML slug:

    [instance]
    recipe = plone.recipe.zope2instance
    ...
    zcml =
        Products.BigramSplitter
  • Re-run buildout, e.g. with:

    $ ./bin/buildout
  • Restart Zope

  • Plone setting – Add on products – Quick install

Old Style

  • Untar downloaded file, then copy to ‘Products’ directory of your Plone instance.

  • Restart Zope

  • Plone setting – Add on products – Quick install

Required

  • Plone3.0.x or higher

License

  • See docs/LICENSE.txt

Author

  • Manabu Terada e-mail : terada@cmscom.jp

  • Mikio Hokari

  • Naoki Nakanishi

  • Naotaka Hotta

  • Takashi Nagai

To Do

  • Add re-install mechanism

  • Supports more languages

Changelog

1.0 - Unreleased

  • Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Products.BigramSplitter-1.0b1.tar.gz (25.6 kB view details)

Uploaded Source

Built Distribution

Products.BigramSplitter-1.0b1-py2.4.egg (35.5 kB view details)

Uploaded Egg

File details

Details for the file Products.BigramSplitter-1.0b1.tar.gz.

File metadata

File hashes

Hashes for Products.BigramSplitter-1.0b1.tar.gz
Algorithm Hash digest
SHA256 92a2522dbd7a3c30e8406da9987b73d4b6ab07bff0a342599e6342e5d4b9f0d2
MD5 20d2ad4a606ee992a7b66a764a2c5df1
BLAKE2b-256 683bf69cede4f9543112661db4fad74717dbaeac742030ecdfe983650767be73

See more details on using hashes here.

File details

Details for the file Products.BigramSplitter-1.0b1-py2.4.egg.

File metadata

File hashes

Hashes for Products.BigramSplitter-1.0b1-py2.4.egg
Algorithm Hash digest
SHA256 d99a2d2809abbc99444d2d42c88dcc585579c546559462a5070904e28ffd6577
MD5 d2fc55c2a856581786b6848313b1c072
BLAKE2b-256 2c496cda95b3f5313e6da5cd58f74696c235f65571ff637448453ff0a116bb09

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page