Load a PDF file and ask questions via llama_index and GPT.
Project description
About Python Chat PDF (GPT Index) Project
Load your PDFs data folder and ask questions via llama_index and GPT.
What is LlamaIndex
LlamaIndex (GPT Index) is a data framework for your LLM application.
Context
- LLMs are a phenomenonal piece of technology for knowledge generation and reasoning. They are pre-trained on large amounts of publicly available data.
- The best approach to augment LLMs with our own private data, we need a comprehensive toolkit to help perform this data augmentation for LLMs.
Proposed Solution
That’s where LlamaIndex comes in. LlamaIndex is a “data framework” to help you build LLM apps. It provides the following tools:
-
Offers data connectors to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.)
-
Provides ways to structure your data (indices, graphs) so that this data can be easily used with LLMs.
-
Provides an advanced retrieval/query interface over your data: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.
-
Allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, anything else).
LlamaIndex provides tools for both beginner users and advanced users. Our high-level API allows beginner users to use LlamaIndex to ingest and query their data in 5 lines of code. Our lower-level APIs allow advanced users to customize and extend any module (data connectors, indices, retrievers, query engines, reranking modules), to fit their needs.
What does load_index_from_storage do and how does it work?
load_index_from_storage is a function that loads an index from a StorageContext object. It takes in a StorageContext object and an optional index_id as parameters. If the index_id is not specified, it assumes there is only one index in the index store and loads it. It then passes the index_ids and any additional keyword arguments to the load_indices_from_storage function. This function then retrieves the index structs from the index store and creates a list of BaseGPTIndex objects. If the index_ids are specified, it will only load the indices with the specified ids. Finally, the function returns the list of BaseGPTIndex objects.
Getting Started
Instructions
- Install the requirements
pip install -r requirements.txt
-
Get a GPT API key from OpenAI if you don't have one already.
-
Run the script.
python3 chat_with_pdfs.py <"data_folder_path"> <"open_api_key">
- Ask any questions about the content of the PDF.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for chatting_with_pdfs-0.0.11.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32738e6bf680e7638f1a8691292d146fe58587823c78dc84886cb18f121e4577 |
|
MD5 | bf74e99bf920daf437ea220d7fa9d1d5 |
|
BLAKE2b-256 | cb775d94eba78fefadebb90cf03a46cc438da94fa23ed255e456467ba8e2d9fd |
Hashes for chatting_with_pdfs-0.0.11-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 205bab14b86aa907edf9efcf3e9879cce949e91aa50965a4eb4e630b7b820b69 |
|
MD5 | 337746169d114729a32df005a5b9390c |
|
BLAKE2b-256 | b21b2504bb933df82350907ccc26420e0e07312dc1d1a0dd1d910629ae1a6695 |