A search algorithm for efficient searching in PDFs
For source code
This the my Github repo. Contact me for support and PRs are welcome.
- Pip install the package.
$ pip3 install smart-search
- NOTE : Please have the pickle file in the same folder as the python script in which you will use our pip package.
Here i use the glove.6B.zip file from Standfords Github repository from the hyperlink.
- Import the library.
>> import smart_search
- Create an object of the class, smart_search.model(). Say,
>> functioncaller = smart_search.model()
- Now to convert a pdf to a list of lists containing page.no and words after stop word removal, we use the built in function
getting_list_of_words(). This accepts 1 argument, i.e the path to the pdf and returns the required list to be fed to the model.
>> pdf_list = functioncaller.getting_list_of_words('path to your pdf')
- Pass this list to the model along with the word you want to get the search result of using the
perform_skip()function. This accepts 2 variables, i.e the list produced by the previous function and the word you want to search for and retuns the top 5 relevant search locations of the word you searched for.
>> location[0:5] = perform_skip(pdf_list, input_word)
- You can use subprocesses library of python to navigate to the page if you want to.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size smart_search-0.0.5-py3-none-any.whl (4.1 kB)||File type Wheel||Python version py3||Upload date||Hashes View hashes|
|Filename, size smart_search-0.0.5.tar.gz (3.1 kB)||File type Source||Python version None||Upload date||Hashes View hashes|
Hashes for smart_search-0.0.5-py3-none-any.whl