Skip to main content

Simple search engine for a list of dictionaries with search and filter.

Project description

4k Search Engine

Nowhere near as cool or powerful as Google so named accordingly!

In short, load a list of dictionaries into it and it allows you to filter and do substring searching on keys. As a bonus it will help you produce drop downs for your filters and is intended to be used with web applications searching datasets small enough to fit into memory.

Features

  • In Memory
  • No Additional Dependencies
  • Filtering
    • In the list
    • Not in the list
  • Substring Search
  • 100% Python
  • NO DB needed
  • NO file storage needed

Common Usage

  • Grab dataset
    • RaaS
    • File
    • DB Query
  • Load data into search
    • Optionally Build Dropdown Options for Keys
  • Setup endpoints
    • Search
    • Optionally Filter Options

Example

Below is a contrived example to show basic app usage, tests/test_search_engine.py is great for looking at individual features/examples as well.

Step One: Setup Example

You can start by downloading the example.py and the sample dataset HPCharactersDataRaw.json from the Data Explorer section of the Characters in Harry Potter Books and placing them in the same directory.

Step Two: Load Necessary Packages

We include the helpers for our example and four_k_search_engine_nosrednakram.

import json
from flask import Flask, jsonify

from four_k_search_engine_nosrednakram.LoadSearchData import LoadSearchData
from four_k_search_engine_nosrednakram.FilterSet import FilterSet
from four_k_search_engine_nosrednakram.Search import Search

Step Three: Load Date to Search Into a Dictionary

The search uses an in memory list of dictionaries. In this example we load the data to search from a JSON.

with open('HPCharactersDataRaw.json') as search_json:
   search_dict = json.load(search_json)

Step Four: Generate Search Object

This step requires the dictionary from the previous step and a list containing the keys you would like to have returned in filter_options. An empty list [] can be passed if you do no wish to populate filter_options with the uniqe values for the provided keys. You can still filter but you're more likely to have key naming issues and harder programatically to sort out. In our example we get two lists returned one with Gender unique values and one with Profession unique values. This make generate dropdown lists easy.

SearchData = LoadSearchData(search_dict, ['Gender', 'Profession'])

Below is a truncated version of the output for our filter dropdowns for example.

"filter_options": {
    "Gender": [
      "Female", 
      "Male", 
      "NaN"
    ],
    "Profession": [
      "\"Agony Aunt\" advice columnist", 
      "20th-century Scourer who preached against Magic", 
      ...
    ]
}

Step Five: Setup Web Server

The example uses Flask for a very quick and easy example. You should read about how to safely use Flask for a production server if you plan to use it.

app = Flask(__name__)

Step Six: Filter Options End Point

Providing the fiter options to a front end app is as simple as returning the filter_options attribute from the Load_SearchData/SearchData object. It is a dictionary for each included key you requested, see above. http://127.0.0.1:5000/filter_options

@app.route('/filter_options', methods=['GET'])
def query_subjects():
   if len(SearchData.filter_options) > 0:
       return json.dumps({'filter_options': SearchData.filter_options})
   else:
       return jsonify({'error': 'data not found'})

Step Seven: Search End Point

I've hardcoded some values and made this a get instead of a post for a quick example. You'll want to convert this to a post and provide the filters and search strings from your front end application in reality. The app should provide filters and/or substring search lists of dictionaries. The referenced field is the key from your dictionary, value is the selected filter value. If you want the records matching include is True. If you want to exclude, the matching records, include is False. The filters come first to limit the amount of string searching needed.

The filtered list is then fed into the search with results returned. The field is the key, and the value is the case-insensitive substring to search for. It's easy to add a True/False attribute for optional case sensitivity, but as I didn't need and this seems more like expectations I didn't bother.

IMPORTANT" The filtered lists are AND, i.e., all filters must be met to be in the list.

Now we return the results. With a few lines of code and a data set, you have a filtering search that is fast as it's all in memory. I Initially wrote this for a course catalogue search with just over 1k courses and a variety of attributes to search and filter on.

IMPORTANT The searchs are OR not an and.

This seems more natural, and when you consider you may be doing lookups on several fields with the same string, this becomes more understandable. Hopefully it will be the correct usage for yourr application as well. http://127.0.0.1:5000/search

@app.route('/search', methods=['GET'])
def query_search():
   # filters = json.load(request.args.get('filters', type=str))
   filters = [{
                   "field": "Gender",
                   "value": "Female",
                   "include": True
             },
             {
                   "field": "Profession",
                   "value": "Auror",
                   "include": True
             }]
   filtered_list = FilterSet(SearchData.master_index, SearchData.master_list, filters).results
   # searches = json.load(request.args.get('searches',' type=str)
   searches = [{
                   "field": 'Name',
                   "value": "ton"
               }]
   # This is any match not exclusive match. Can change easy enough by looping hear and re-feeding the result
   # and next search filter. I think matching any substring search after filters may be desirable.
   if len(searches) > 0:
       filtered_list = Search(filtered_list, searches).results

   if len(filtered_list) > 0:
       return jsonify(filtered_list)
   else:
       return jsonify({'error': 'data not found'})

Misc

I run this in debug mode to help while developing.

if __name__ == '__main__':
   app.run(debug=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

four_k_search_engine-0.1.0.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

four_k_search_engine-0.1.0-py2.py3-none-any.whl (8.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file four_k_search_engine-0.1.0.tar.gz.

File metadata

  • Download URL: four_k_search_engine-0.1.0.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.3

File hashes

Hashes for four_k_search_engine-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b005561650d611db7e8fd109bc6f509025057401f9bfb941efe065dea27be9f8
MD5 b0db5a9688d54b661317a01fbc866a48
BLAKE2b-256 07ff57f4bb291feda1a835a32ca3a9c91886eaf9b1db3c9344b1247be9d1dcd5

See more details on using hashes here.

File details

Details for the file four_k_search_engine-0.1.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for four_k_search_engine-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 dd95871771c14d32656f3faf09a3c63264b6439cb4fca4f937b09f2826c8df3a
MD5 c7038aeb2f770cba9d4b300317d21ee7
BLAKE2b-256 828ea1a2c8e140daffd09b74d4bade52e85aa8bcd06159a09ab9899f39480992

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page