Load a PDF file and ask questions via llama_index and GPT.
Project description
About Python Chat PDF (GPT Index) Project
Load your PDFs data folder and ask questions via llama_index and GPT.
What is LlamaIndex
LlamaIndex (GPT Index) is a data framework for your LLM application.
Context
- LLMs are a phenomenonal piece of technology for knowledge generation and reasoning. They are pre-trained on large amounts of publicly available data.
- The best approach to augment LLMs with our own private data, we need a comprehensive toolkit to help perform this data augmentation for LLMs.
Proposed Solution
That’s where LlamaIndex comes in. LlamaIndex is a “data framework” to help you build LLM apps. It provides the following tools:
-
Offers data connectors to ingest your existing data sources and data formats (APIs, PDFs, docs, SQL, etc.)
-
Provides ways to structure your data (indices, graphs) so that this data can be easily used with LLMs.
-
Provides an advanced retrieval/query interface over your data: Feed in any LLM input prompt, get back retrieved context and knowledge-augmented output.
-
Allows easy integrations with your outer application framework (e.g. with LangChain, Flask, Docker, ChatGPT, anything else).
LlamaIndex provides tools for both beginner users and advanced users. Our high-level API allows beginner users to use LlamaIndex to ingest and query their data in 5 lines of code. Our lower-level APIs allow advanced users to customize and extend any module (data connectors, indices, retrievers, query engines, reranking modules), to fit their needs.
What does load_index_from_storage do and how does it work?
load_index_from_storage is a function that loads an index from a StorageContext object. It takes in a StorageContext object and an optional index_id as parameters. If the index_id is not specified, it assumes there is only one index in the index store and loads it. It then passes the index_ids and any additional keyword arguments to the load_indices_from_storage function. This function then retrieves the index structs from the index store and creates a list of BaseGPTIndex objects. If the index_ids are specified, it will only load the indices with the specified ids. Finally, the function returns the list of BaseGPTIndex objects.
Getting Started
Instructions
- Install the requirements
pip install -r requirements.txt
-
Get a GPT API key from OpenAI if you don't have one already.
-
Run the script.
python3 chat_with_pdfs.py <"data_folder_path"> <"open_api_key">
- Ask any questions about the content of the PDF.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for chatting_with_pdfs-0.0.10.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff4b1463c4133518b48ba9202e541e94cc2bb5ddeddf3eb96cdeecae3714be55 |
|
MD5 | c4c1448c944c66d4fa6531d5dbd58f22 |
|
BLAKE2b-256 | 8b70718713b7752efa9569ba9e0a1089bb3b23ddbd08e977019c2656b88d082b |
Hashes for chatting_with_pdfs-0.0.10-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2284d4cb069ecb0dd02d25f4cba09aadbcfa19d1c9d1d0f4250f3cce140c7b0 |
|
MD5 | 69c0d55732ff982be29e92e9351ecb3f |
|
BLAKE2b-256 | b0bcd940a5ffc6975b5cb7a9a7a008eb4839b710b6aec523b47ef44e28380408 |