Search through a document using a chat interface
Project description
Doc Search
Converse with a book (PDF)
See tweet for full demo.
Documentation: https://namuan.github.io/dr-doc-search
Source Code: https://github.com/namuan/dr-doc-search
PyPI: https://pypi.org/project/dr-doc-search/
Pre-requisites
Installation
pip install dr-doc-search
Example Usage
There are two steps to use this application:
1. First, you need to create the index and generate embeddings for the PDF file. Here I'm using a PDF file generated from this page Parable of a Monetary Economy
Before running this, you need to set up your OpenAI API key. You can get it from OpenAI.
export OPENAI_API_KEY=<your-openai-api-key>
The run the following command to start the training process:
dr-doc-search --train -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf
The training process generates some temporary files in the OutputDir/dr-doc-search/<pdf-name>
folder under your home directory.
Here is what it looks like:
~/OutputDir/dr-doc-search/parable-of-a-monetary-economy-heteconomist
$ tree
.
├── images
│ ├── output-1.png
│ ├── output-10.png
│ ├── output-11.png
...
│ └── output-9.png
├── index
│ ├── docsearch.index
│ └── index.pkl
├── parable-of-a-monetary-economy-heteconomist.pdf
└── scanned
├── output-1.txt
...
└── output-9.txt
Note: It is possible to change the base of the output directory by providing the
--app-dir
argument.
2. Now that we have the index, we can use it to start asking questions.
dr-doc-search -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf --input-question "How did the attempt to reduce the debut resulted in decrease in employment?"
Or You can open up a web interface (on port :5006) to ask questions:
dr-doc-search --web-app -i ~/Downloads/parable-of-a-monetary-economy-heteconomist.pdf
There are more options for choose the start and end pages for the PDF file. See the help for more details:
dr-doc-search --help
Acknowledgements
- anton/@abacaj for the idea
- LangChain
- HoloViz Panel
- OpenAI
Development
-
Clone this repository
-
Requirements:
- Python 3.7+
- Poetry
-
Create a virtual environment and install the dependencies
poetry install
- Activate the virtual environment
poetry shell
Validating build
make build
Release process
A release is automatically published when a new version is bumped using make bump
.
See .github/workflows/build.yml
for more details.
Once the release is published, .github/workflows/publish.yml
will automatically publish it to PyPI.
Disclaimer
This project is not affiliated with OpenAI. The OpenAI API and GPT-3 language model are not free after the trial period.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dr_doc_search-1.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0aecdd77461feeef1f1350adcf6ec4ca7ff8fce18a511a9ae01f2ee14d104c4 |
|
MD5 | 38ffaddedc1d1eef16875ffd71f42949 |
|
BLAKE2b-256 | 7314b440b65fe7bc108f3f8d4c8d687d3630fbdb127780274c44075d5d0414f5 |