Skip to main content

A library for doing search on different kind of files

Project description

SimpleSearch

SimpleSearch lets you index and search your documents. It was designed to manipulate different types of documents transparently.

Note

I developed simplesearch for curiosity reasons, so don't try to run it in production, however, you may find the code helpful as it's well documented.

Installation

You can install simplesearch using pip

$ pip install simplesearch

or install it from source

$ git clone https://github.com/youben11/simplesearch
$ cd simplesearch
$ python3 setup.py install

Usage

Simple users will only find two function calls useful, add_file_to_index() and search_keyword(), those two will allow you to build the index of your documents as well as searching using a list of keywords.

Note

Keep in mind that using the add_file_to_index() function will create an sqlite3 database file (.simplesearch.db) in your current directory, this same database file will be used for doing search, so doing other operations in another directory will create another index and thus different results.

Example

Below is a code snippet that index some local files and then do some search operations. Here we used PDFs as it was the only supported document type while writing this example.

import simplesearch

# We assume that this file contains words like
# programming python indexing
simplesearch.add_file_to_index("/home/youben/simplesearch.pdf")

# We assume that this file contains words like
# machine-learning deep-learning python
simplesearch.add_file_to_index("/home/youben/ml.pdf")

# Both files have been indexed now, we can do some search operations

# We searched a specific keyword found only in the second indexed document
simplesearch.search_keywords(["machine-learning"])
['/home/youben/ml.pdf']

# We now do a search on a common keyword for both docs
simplesearch.search_keywords(["python"])
['/home/youben/simplesearch.pdf', '/home/youben/ml.pdf']

# We can also use multiple keywords
simplesearch.search_keywords(["python", "machine-learning"])
['/home/youben/ml.pdf', '/home/youben/simplesearch.pdf']

# The last result was sorted by best match, as the first document matches with two keywods
# while the second match with only one

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simplesearch-0.1a0.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simplesearch-0.1a0-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file simplesearch-0.1a0.tar.gz.

File metadata

  • Download URL: simplesearch-0.1a0.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.4

File hashes

Hashes for simplesearch-0.1a0.tar.gz
Algorithm Hash digest
SHA256 85d96c28a0c806e4d61650ef10f3da3a7ea15e3b743de14626770a8380ded70a
MD5 83f90bf7b805b597b601cb3fc6136b69
BLAKE2b-256 0d0698b70d050ede8e534bb82a572cffae693d7b664fc9b41f06d162738721ba

See more details on using hashes here.

File details

Details for the file simplesearch-0.1a0-py3-none-any.whl.

File metadata

  • Download URL: simplesearch-0.1a0-py3-none-any.whl
  • Upload date:
  • Size: 11.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.4

File hashes

Hashes for simplesearch-0.1a0-py3-none-any.whl
Algorithm Hash digest
SHA256 aa70c560132dbdfd3ab263f74a416c238ef9a15913ae25f21e0ef6d5666c9e19
MD5 ef16e42a3175ba4527b52fac8cb41f26
BLAKE2b-256 e918923793041e524523e3df2b18886d212178de1cd3c8e806e9c6565c92e8c1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page