Skip to main content

Local retriever search for your use

Project description

Retriever

Visually search and analyze your documents, entirely locally.

GitHub Contributors GitHub Last Commit GitHub Repo Size GitHub Issues GitHub Pull Requests Github License

Install

Options:

  1. Install with pip (Stable Release)
$ pip install retriever-search
  1. Install from Github Repo (Latest Release)
$ git clone https://github.com/GovML/retriever.git
$ pip install -e .

We recommended using a virtual environment for all dependency installations. Before installing our repo, you can use venv to isolate the various packages installed in this environment to prevent conflicts with versions already installed on your computer.

$ python -m venv new_env
$ source new_env/bin/activate

Quickstart - Launching Retriever

Retriever is composed of two parts that you'll need to launch.

  1. Backend: The backend server ingests and returns search results. This server is exposed locally via Flask.
  2. Frontend: The frontend is the user interface (UI) you use to input searches and visualize your results. The frontend sends requests to the backend server.

First you'll need to ensure you have a folder of PDFs on your computer. If you don't have PDFs handy, we've provided a script under tutorials to download a few example papers from arXiv.

Once you have your folder of PDFs, you can start up the backend search server by opening up a terminal window and running:

>>> from retriever_search import search_server
>>> search_server.run_search_server('./pdfs_folder/', json_save_path='save_results.json', device='cpu')

If your computer has a CUDA compatible GPU, you can change device='cuda' or if you are on mac, device='mps'

Next, open up a second terminal window and run the following:

>>> from retriever_search import frontend_app as fp
>>> fp.run_frontend()

Retriever should be up and running! You can access the UI at the following port on your computer: http://127.0.0.1:7860. This URL would work for your local setup only (paste the url into your browser which can render the UI, but you won't need an external internet connection to use it).

Next time you run Retriever, you can call it directly on the json you just saved your pdfs to in order to save time! search_server.run_search_server(input_json='save_results.json', device='cpu')

Full Parameter Guide for Search server

>>> search_server.run_search_server('input_directory', 'input_json', 'json_save_path', 'embedding_model', 'qa_model', device='cpu')

Search parameter definitions

  • input_directory -- The directory holding your files, optional if input_json is passed instead
  • input_json -- pre saved json file from earlier runs can be used for faster loading, optional if input_directory is passed instead
  • json_save_path -- (optional) pass for saving the embeddings to a json can be used later as input_json
  • embedding_model -- (optional) pick the embedding model you want to we use Spectre model as a default
  • qa_model -- (optional) you can currently pick between tiny, medium and large
  • device -- (optional) can be set to cpu, mps or cuda

Tickets

1.1.0

  • Make LDA visualization update
  • QA Model Improvements
  • Add support for HTML, txt

1.0.2

  • Add Quickstart

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

retriever_search-1.0.1.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

retriever_search-1.0.1-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file retriever_search-1.0.1.tar.gz.

File metadata

  • Download URL: retriever_search-1.0.1.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.12

File hashes

Hashes for retriever_search-1.0.1.tar.gz
Algorithm Hash digest
SHA256 71bc5da379f0690ca45d3fed1c9e64c04d89af60db643ea91e5fac1dd79ce333
MD5 ac02e965656911e9e9d8ed19cd2a2c4b
BLAKE2b-256 16201a9c5be3d06b2a4811cf99a11d4a2377fe145d89fd82ec41d40ce70445da

See more details on using hashes here.

File details

Details for the file retriever_search-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for retriever_search-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7a32b0eb98aef5cd2cb30f121da71fa81a643c47d15884a3f4c204a21442e366
MD5 c7ea912115f7220ca7cc36e0b2b8ff79
BLAKE2b-256 ce3b854d88f11da0a1143e35ca43cc1b2f88e7e60065c1d6687a1ace2ca00d1a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page