Skip to main content

A hint-enabled search engine framework for biomedical classification systems

Project description

Cateye

A hint-enabled search engine framework for biomedical classification systems

Build Status

Features

  • Hint: Show hints for search terms which can narrow down the results fast.
  • Fallback: If no result satisfying the query, the system automatically eliminates less important search terms.
  • Spelling correction: Build-in spelling correction for query terms.
  • Abbreviation expansion: Pre-defined abbreviation list will be automatically applied during the search
  • Sorted results: Sort the results according to the search history.

Installation

$ git clone https://github.com/jeroyang/cateye.git
$ cd cateye
$ pip install -e .

Usage

1. Run the Demo Site:

$ FLASK_APP=application.py FLASK_ENV=development flask run

Then browse the local site http://127.0.0.1:5000/ Try to search "rhinitis"

2. Make your own site:

2-1. Check the constants.py:

Setup the essential variables in the constants.py: SITE_TITLE, SITE_SUBTITLE, TOKEN_FOLDER, SNIPPET_FOLDER, HINT_FOLDER, SPELLING_FILE, ABBREVIATION_FILE, INDEX_URL

The INDEX_URL will be used in the Shove object, which can be a local URL starts with file:// please check the document of Shove.

2-2. Data preparing

Folders overview:

  • data: The data source for the search engine, all information in this subfolders using the term id as their filenames
  • data/token: The tokens of the documents, after lemmatization
  • data/snippet: The HTML snippets of the documents, which will be shown on the search results
  • data/hint: The hints for each entity
  • data/spelling.txt: The formal spelling of your tokens (before normalization). If possible, sort the tokens with the frequency of usage, the most common word the first.
  • data/abbreviation.txt: The abbreviations, one line for one abbreviation pair, using tab to separate the short form and long form

Cateye include some very basic text processing tools: tokenizer (cateye.tokenize) and lemmatizer (cateye.lemmatize)

The tokenize function will be used in two places: the first place is to cut your documents into tokens, and the second place is to cut your query into tokens.

The lemmatizing function will normalize your tokens. If you wish to build a case-insensitive search engine, you may use lowercase lemmatizer on the tokens.

2-3. Build the index:

Run the command in the command line

$ cateye newindex

This command read the files in the token_folder and build an on-disk index in the index_url. It takes time depending on the size of your data.

2-4. Run your application:

$ FLASK_APP=application.py FLASK_ENV=development flask run

License

  • Free software: MIT license

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cateye-0.4.4.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

cateye-0.4.4-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file cateye-0.4.4.tar.gz.

File metadata

  • Download URL: cateye-0.4.4.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.6.1

File hashes

Hashes for cateye-0.4.4.tar.gz
Algorithm Hash digest
SHA256 b02674cfc7a21d2864fc3a27f10118edf5dc229202622a8cb5f674fa9a3beaef
MD5 5c6f160332fef0be60ae55fc96b0417c
BLAKE2b-256 74c60ddf93249d9b18f1ffd8191689203cc183bf3148b222eb639c42eee33939

See more details on using hashes here.

File details

Details for the file cateye-0.4.4-py3-none-any.whl.

File metadata

  • Download URL: cateye-0.4.4-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.6.1

File hashes

Hashes for cateye-0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 008b5dbf7b0c48e1a30c6688b5202cb8e52652b694dbe929a889992d6a95c8cc
MD5 af92ba5b4af94b8abcf48381e5b3706b
BLAKE2b-256 060502dc13e7626d2c3fdb48b0ab646f5236a8ba307a288a498738c0428b88f8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page