Skip to main content
Join the official Python Developers Survey 2018 and win valuable prizes: Start the survey!

'BigramSplitter' is add-on search product for Plone 3.x. It supports non-English languages, especially south east Asian languages.

Project description


Specification: Text character normalization process uses Python unicodedata. Convert full-width numeric and alphabet character into half-width equivalent. Convert half-width Katakana into full-width equivalent. Therefore all of above character variations can be recognized as same ones.

Language Specifications:

  • Chinese
  • No space between words.
  • There is only Kanji(Chinese) character
  • Process with Bigram(2-gram) model
  • Japanese
  • No space between words
  • Combination 0f Kanji(Chinese), Katakana, and Hiragana character
  • Korean
  • There are spaces between words, but it contains a particle
  • Combination of Korean alphabet and Kanji(Chinese) character
  • Discriminate Korean alphabet and Kanji(Chinese) character and processed with Bigram(2-gram) model
  • Thai
  • No space between words
  • It’s very difficult to handle this language in a computer
  • A vowel and a consonant are registered in Unicode separately so that it is difficult to recognize as one word.
  • However, there is a possibility of dealing with Thai characters to use Bigram(2-gram) model.
  • Other languages (Including English)
  • There is a space between words
  • It is indexed each word


  • Source Code

    Since no documents are available on how to develop ‘word splitter’, we refer to other splitter source code. But I still have a number of questions. If you have any more information, please feel free let us know.

  • Hotfix to Plone 3.0 source code

    Because Plone 3.x catalog setting, catalog.xml, doesn’t have existing index overwrite mechanism, we developed hotfix and added XML attribute. We believe Plone 3 XML define mechanism is simple and clear, so that we take this approach. We appreciate any comment.


Use zc.buildout

  • Add Products.BigramSplitter to the list of eggs to install, e.g.:

    eggs =
  • Tell the plone.recipe.zope2instance recipe to install a ZCML slug:

    recipe = plone.recipe.zope2instance
    zcml =
  • Re-run buildout, e.g. with:

    $ ./bin/buildout
  • Restart Zope

  • Plone setting – Add on products – Quick install

Old Style

  • Untar downloaded file, then copy to ‘Products’ directory of your Plone instance.
  • Restart Zope
  • Plone setting – Add on products – Quick install


  • Plone3.0.x or higher


  • See docs/LICENSE.txt


  • Manabu Terada e-mail :
  • Mikio Hokari
  • Naoki Nakanishi
  • Naotaka Hotta
  • Takashi Nagai


1.0 (2010-12-06)

  • Adding uninstall script

1.0b4 (2010-06-07)

  • Fixed missing skin folder name

1.0b3 (2010-03-20)

  • Adding keyword highlight (JavaScript)

1.0a2 (2010-01-29)

  • Fixed full width space for and search

1.0a1 (2009-12-05)

  • Initial release

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
Products.BigramSplitter-1.0-py2.4.egg (39.6 kB) Copy SHA256 hash SHA256 Egg 2.4 Dec 6, 2010
Products.BigramSplitter-1.0.tar.gz (22.1 kB) Copy SHA256 hash SHA256 Source None Dec 6, 2010

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page