Simple search engine for a list of dictionaries with search and filter.
Project description
4k Search Engine
Nowhere near as cool or powerful as Google so named accordingly!
In short, load a list of dictionaries into it and it allows you to filter and do substring searching on keys. As a bonus it will help you produce drop downs for your filters and is intended to be used with web applications searching datasets small enough to fit into memory.
Features
- In Memory
- No Additional Dependencies
- Filtering
- In the list
- Not in the list
- Substring Search
- 100% Python
- NO DB needed
- NO file storage needed
Common Usage
- Grab dataset
- RaaS
- File
- DB Query
- Load data into search
- Optionally Build Dropdown Options for Keys
- Setup endpoints
- Search
- Optionally Filter Options
Example
Below is a contrived example to show basic app usage, tests/test_search_engine.py is great for looking at individual features/examples as well.
Step One: Setup Example
You can start by downloading the example.py] and the sample dataset HPCharactersDataRaw.json from the Data Explorer section of the Characters in Harry Potter Books and placing them in the same directory.
Step Two: Load Necessary Packages
We include the helpers for our example and four_k_search_engine_nosrednakram.
import json
from flask import Flask, jsonify
from four_k_search_engine_nosrednakram.LoadSearchData import LoadSearchData
from four_k_search_engine_nosrednakram.FilterSet import FilterSet
from four_k_search_engine_nosrednakram.Search import Search
Step Three: Load Date to Search Into a Dictionary
The search uses an in memory list of dictionaries. In this example we load the data to search from a JSON.
with open('HPCharactersDataRaw.json') as search_json:
search_dict = json.load(search_json)
Step Four: Generate Search Object
This step requires the dictionary from the previous step and a list containing the keys you would like to have returned in filter_options. An empty list [] can be passed if you do no wish to populate filter_options with the uniqe values for the provided keys. You can still filter but you're more likely to have key naming issues and harder programatically to sort out. In our example we get two lists returned one with Gender unique values and one with Profession unique values. This make generate dropdown lists easy.
SearchData = LoadSearchData(search_dict, ['Gender', 'Profession'])
Below is a truncated version of the output for our filter dropdowns for example.
"filter_options": {
"Gender": [
"Female",
"Male",
"NaN"
],
"Profession": [
"\"Agony Aunt\" advice columnist",
"20th-century Scourer who preached against Magic",
...
]
}
Step Five: Setup Web Server
The example uses Flask for a very quick and easy example. You should read about how to safely use Flask for a production server if you plan to use it.
app = Flask(__name__)
Step Six: Filter Options End Point
Providing the fiter options to a front end app is as simple as returning the filter_options attribute from the Load_SearchData/SearchData object. It is a dictionary for each included key you requested, see above. http://127.0.0.1:5000/filter_options
@app.route('/filter_options', methods=['GET'])
def query_subjects():
if len(SearchData.filter_options) > 0:
return json.dumps({'filter_options': SearchData.filter_options})
else:
return jsonify({'error': 'data not found'})
Step Seven: Search End Point
I've hardcoded some values and made this a get instead of a post for a quick example. You'll want to convert this to a post and provide the filters and search strings from your front end application in reality. The app should provide filters and/or substring search lists of dictionaries. The referenced field is the key from your dictionary, value is the selected filter value. If you want the records matching include is True. If you want to exclude, the matching records, include is False. The filters come first to limit the amount of string searching needed.
The filtered list is then fed into the search with results returned. The field is the key, and the value is the case-insensitive substring to search for. It's easy to add a True/False attribute for optional case sensitivity, but as I didn't need and this seems more like expectations I didn't bother.
IMPORTANT" The filtered lists are AND, i.e., all filters must be met to be in the list.
Now we return the results. With a few lines of code and a data set, you have a filtering search that is fast as it's all in memory. I Initially wrote this for a course catalogue search with just over 1k courses and a variety of attributes to search and filter on.
IMPORTANT The searchs are OR not an and.
This seems more natural, and when you consider you may be doing lookups on several fields with the same string, this becomes more understandable. Hopefully it will be the correct usage for yourr application as well. http://127.0.0.1:5000/search
@app.route('/search', methods=['GET'])
def query_search():
# filters = json.load(request.args.get('filters', type=str))
filters = [{
"field": "Gender",
"value": "Female",
"include": True
},
{
"field": "Profession",
"value": "Auror",
"include": True
}]
filtered_list = FilterSet(SearchData.master_index, SearchData.master_list, filters).results
# searches = json.load(request.args.get('searches',' type=str)
searches = [{
"field": 'Name',
"value": "ton"
}]
# This is any match not exclusive match. Can change easy enough by looping hear and re-feeding the result
# and next search filter. I think matching any substring search after filters may be desirable.
if len(searches) > 0:
filtered_list = Search(filtered_list, searches).results
if len(filtered_list) > 0:
return jsonify(filtered_list)
else:
return jsonify({'error': 'data not found'})
Misc
I run this in debug mode to help while developing.
if __name__ == '__main__':
app.run(debug=True)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file four_k_search_engine-0.1.1.tar.gz.
File metadata
- Download URL: four_k_search_engine-0.1.1.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2458ee87da5400d6f17fa09e8935e9ff7b9118747078ee86609698a9b3efe49
|
|
| MD5 |
dfd55b48b4d4963dd5ffed16d37cef37
|
|
| BLAKE2b-256 |
0752876c4750662809022193e5a352628e83c7f6287b32b399b5ab16d4a8e4e8
|
File details
Details for the file four_k_search_engine-0.1.1-py2.py3-none-any.whl.
File metadata
- Download URL: four_k_search_engine-0.1.1-py2.py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a78939e8e5d0798e430db5d23d926184e69f57f2ea7e052b3db58b34355df3e9
|
|
| MD5 |
22f2b29b1a91a9e5d6dbccd4ccd671bf
|
|
| BLAKE2b-256 |
c440f58d6faae4908b29370849f1cc6e59824bca886b1315b25eed712ac31221
|