Python library to abbreviate a PDF file to GPT 8k prompt length
Project description
PDFtoPrompt
Existing libraries for using GPT-4 to extract information from a PDF fil typically combine GPT-4 with word searching, indexing, and segmentation. Those strategies work reasonably, but they have one significant limitation: they deprive the LLM of "big picture" context.
PDFtoPrompt takes a different strategy. Inspired by Twitter user @gfodor's experiments with text compression, it uses GPT-4 to compress or distill a PDF file's entire informational content to below the length limit of a single ChatGPT prompt.
It achieves this by first calculating what compression factor is needed to get the text to the right length, then segmenting the PDF file and asking GPT-4 to compress each segment, and finally stitching the compressed segments back together. You should then be able to fit the full compressed text into a single ChatGPT prompt, with some room left over to ask a question.
The process is, as @gfodor notes, pretty "lossy," especially for longer texts. This tool may be best used in combination with others built on other strategies.
Installation
- Install with pip:
pip install pdftoprompt
Usage
Setting your API Key
Make sure to first set your GPT-4-approved OpenAI API key with the set_openai_api_key function:
from pdftoprompt import set_openai_api_key
set_openai_api_key()
This function either takes your API key as a string argument or looks in the .env file in the current working directory to see if you have an OPENAI_API_KEY variable stored there. I recommend saving your API key in the .env file for your project so you can share your code without worrying about key security. If you're uploading code to GitHub, make sure to add .env to .gitignore.
Compressing a PDF to an LLM Prompt
Next, import the compress_pdf
function from the pdftoprompt
library, and call it with the PDF url or file path:
from pdftoprompt import compress_pdf
file_path = "https://arxiv.org/pdf/2203.06566.pdf"
compressed_text = compress_pdf(file_path)
print(compressed_text)
The above code distills an academic paper titled "PromptChainer: Chaining Large Language Model Prompts through Visual Programming" down to the following GPT-interpretable prompt:
PromptChainer: Chaining LLM Prompts via Visual Programming by Tongshuang Wu, Ellen Jiang, Aaron Donsbach, Jeff Gray, Alejandra Molina, Michael Terry, and Carrie J.Cai explores LLM chain authoring. Pilot studies show users need support transforming data between steps and debugging chains. PromptChainer is designed to address these needs, providing an interactive interface for visually programming chains. Case studies with four designers and developers demonstrate its ability to support building prototypes for various applications. Open questions remain on scaling chains to more complex tasks and supporting low-fi chain prototyping.3.2 Interface Design Designing the interface in Figure 1 addresses challenges with Chain View (Figure 1A) for chain structure authoring, Node View (Figure 1B) for single step authoring, and chain debugging support. Chain View is a visual panel for building and viewing chains, with nodes representing steps and edges denoting connections. Node visualization (Figure 4) includes named inputs/outputs, status icons, and data views. Node types (Figure 3) cover diverse user needs, including Generic LLM nodes, LLM Classifier nodes, helper nodes, and communication nodes. Example gallery helps users develop mental models and prompting patterns.Node View enables node inspection, implementation, testing, and automatic input name parsing based on LLM prompts or JavaScript function signatures. Global chain consistency is ensured by automatically updating input handles when prompt templates change. Interactive debugging functionalities address cascading error challenges and enable unit testing, end-to-end assessments, and breakpoint debugging.
4 USER FEEDBACK SESSIONS Preliminary study aimed to understand users' desired chains, PromptChainer support, and challenges faced. Users proposed diverse tasks, some with branching logic and others with iterative content. Chaining patterns included parallel logic branches and incremental iterations on content. Chaining rationales included addressing LLM limitations and making prototypes more generalizable. PromptChainer supported various chain construction strategies and multi-level debugging. Participants used predefined helper nodes more than customized JS nodes.Q: Remaining challenges in chain authoring? A: 1. Ensuring coherence in interdependent sub-tasks; 2. Tracking chains with complex logic.
Challenges include maintaining coherence in chains with interdependent parallel tasks and tracking complex decomposition. P4's story writing chain generated a paragraph for each outline point, resulting in a final essay lacking coherence. One user created an input node to manually track previous outputs. Future work could investigate methods considering inter-dependency between parallel sub-tasks and enhancing PromptChainer's tracing capabilities. Customized chain grouping and execution visualizations may help address these issues.
Study limitations: Participants may have felt invested in their pre-created prompts, making them less inclined to consider other chain structures. Prior prototyping work suggests concurrent consideration of multiple alternatives can lead to better outcomes. Future work could explore low-fi prototyping of multiple chains and task decomposition strategies for larger, more complex tasks. Encouraging users to create "half-baked" chain constructions without investing too much time in prompting upfront may also be beneficial.
Note that when we ask GPT to compress the text, we specifically instruct it that the text doesn't have to be human-readable. The goal here isn't to get a shortened version that works for humans. It's to get a shortened version that works as a Large Language Model prompt.
OCR
In theory, you should be able to use OCR by setting the compress_pdf
function's use_ocr
argument to True, but that functionality requires that you install Tesseract OCR and add it to your system path, and I can't vouch for this functionality because I haven't tested it yet.
Contributing
If you'd like to contribute to this library, please submit a pull request on GitHub.
License
This library is released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pdftoprompt-0.1.2.tar.gz
.
File metadata
- Download URL: pdftoprompt-0.1.2.tar.gz
- Upload date:
- Size: 6.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.1 CPython/3.9.13 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed3b503cc2408779501816de3ff11521c610ca812ba5b0b553231e670f4917dd |
|
MD5 | 1326be7ae95171aedcd4f792d4eb6c4c |
|
BLAKE2b-256 | 2bba1912b23978d3c732af2073807e7f3ce8a2c19788dce0516cbfea2aa0a4c1 |
File details
Details for the file pdftoprompt-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: pdftoprompt-0.1.2-py3-none-any.whl
- Upload date:
- Size: 6.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.1 CPython/3.9.13 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12d2d1ad0f1d626bad6435559c1cfff51bb699497e4110c86bd7d900c02cffd5 |
|
MD5 | 3c5742d61107a7cdd8f5157a07bd431c |
|
BLAKE2b-256 | 1ef66f0cca327bad1ca3469ac1c2e869b1d7a71d8d99ba5972dbb466d3c55490 |