Load a PDF file and ask questions via llama_index and GPT.
Project description
About Python Chat PDF (GPT Index) Project
Load your PDFs data folder and ask questions via llama_index and GPT.
What is LlamaIndex
LlamaIndex (GPT Index) is a data framework for your LLM application.
Context
- LLMs are a phenomenonal piece of technology for knowledge generation and reasoning. They are pre-trained on large amounts of publicly available data.
- The best approach to augment LLMs with our own private data, we need a comprehensive toolkit to help perform this data augmentation for LLMs.
Proposed Solution
That’s where LlamaIndex comes in. LlamaIndex is a “data framework” to help you build LLM apps. It provides the following tools:
-
Offers data connectors to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.)
-
Provides ways to structure your data (indices, graphs) so that this data can be easily used with LLMs.
-
Provides an advanced retrieval/query interface over your data: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.
-
Allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, anything else).
LlamaIndex provides tools for both beginner users and advanced users. Our high-level API allows beginner users to use LlamaIndex to ingest and query their data in 5 lines of code. Our lower-level APIs allow advanced users to customize and extend any module (data connectors, indices, retrievers, query engines, reranking modules), to fit their needs.
What does load_index_from_storage do and how does it work?
load_index_from_storage is a function that loads an index from a StorageContext object. It takes in a StorageContext object and an optional index_id as parameters. If the index_id is not specified, it assumes there is only one index in the index store and loads it. It then passes the index_ids and any additional keyword arguments to the load_indices_from_storage function. This function then retrieves the index structs from the index store and creates a list of BaseGPTIndex objects. If the index_ids are specified, it will only load the indices with the specified ids. Finally, the function returns the list of BaseGPTIndex objects.
Getting Started
Instructions
- Install the requirements
pip install -r requirements.txt
-
Get a GPT API key from OpenAI if you don't have one already.
-
Run the script.
python3 chat_with_pdfs.py <"data_folder_path"> <"open_api_key">
- Ask any questions about the content of the PDF.
How to use our package:
from chat_pdf.chat_with_pdfs import ask_a_question
import sys
folder_name = sys.argv[1]
api_key = sys.argv[2]
print(ask_a_question(folder_name, api_key))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for chatting-with-pdfs-0.0.12.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f5826b7c264a4567e762fa1772445480d1b7c122b6c711d5d1baf71629e2ecc |
|
MD5 | 4253599cfc14bf93fb7c2bee5bb384d6 |
|
BLAKE2b-256 | dbb2906f68501d4ea471df439eec3d2584bcc358aa872e6bb9a5ffc5e8dea4c6 |
Hashes for chatting_with_pdfs-0.0.12-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 503aa801491e0ca3c07d275dec0461d326716b55340ace63afbebdee27ae646a |
|
MD5 | c77d72f42372bbff2b1dbc715a15f0ed |
|
BLAKE2b-256 | 6e452fc642092bf7b8c4fada889350a4040314ce1c3f633aec8ccf60e6e039c9 |