A search algorithm for efficient searching in PDFs
Project description
Concept-Search
For source code
This the my Github repo. Contact me for support and PRs are welcome.
Usage
- Pip install the package.
$ pip3 install smart-search
- NOTE : Please have the pickle file in the same folder as the python script in which you will use our pip package.
Here i use the glove.6B.zip file from Standfords Github repository from the hyperlink.
Syntax
- Import the library.
>> import smart_search
- Create an object of the class, smart_search.model(). Say,
functioncaller
.
>> functioncaller = smart_search.model()
- Now to convert a pdf to a list of lists containing page.no and words after stop word removal, we use the built in function
getting_list_of_words()
. This accepts 1 argument, i.e the path to the pdf and returns the required list to be fed to the model.
>> pdf_list = functioncaller.getting_list_of_words('path to your pdf')
- Pass this list to the model along with the word you want to get the search result of using the
perform_skip()
function. This accepts 2 variables, i.e the list produced by the previous function and the word you want to search for and retuns the top 5 relevant search locations of the word you searched for.
>> location[0:5] = perform_skip(pdf_list, input_word)
- You can use subprocesses library of python to navigate to the page if you want to.
LICENSE
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
smart_search-0.0.5.tar.gz
(3.1 kB
view details)
Built Distribution
File details
Details for the file smart_search-0.0.5.tar.gz
.
File metadata
- Download URL: smart_search-0.0.5.tar.gz
- Upload date:
- Size: 3.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7aa5400f64a17116784d2204245201728eeac524b8e58479729d1658f69fe89b |
|
MD5 | ee3f5db2b1c93039b8b4300ac61dba57 |
|
BLAKE2b-256 | f1f42189366bbf67e1e86408da4628520334a742eb8c7b22503cb8d47608630f |
File details
Details for the file smart_search-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: smart_search-0.0.5-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11b4ada4f75ae9d16bee09afd30609f15e871daba94eb62d55af9ffab328240f |
|
MD5 | 891cfb3d33602b571f42dd29cb2cc04d |
|
BLAKE2b-256 | 4ace8f6882ccfae35f4f4ceb4abdb578513636b0b69da4bceaadc4a159e8ed0a |