Utilities built upon the langchain library
Project description
langchain-utils
LangChain Utilities
Prompt generation using LangChain document loaders
Do you find yourself frequently copy-pasting texts from the web / PDFs / other documents into ChatGPT?
If yes, these tools are for you!
Optimized to feed into a chat interface (like ChatGPT) manually in one or multiple (to get around context length limits) goes.
Basically, the prompts generated look like this:
REPLY_OK_IF_YOU_READ_TEMPLATE = '''
Below is {what}, reply "OK" if you read:
"""
{content}
"""
'''.strip()
You can feed it directly to a chat interface like ChatGPT, and ask follow up questions about it.
See prompts.py
for other variations.
Demos
- Loading
https://github.com/tddschn/langchain-utils
and copy to clipboard:
- Load 3 pages of a pdf file, open each part for inspection before copying, and optionally merge 3 pages into 2 prompts that wouldn't go over the
gpt-3.5-turbo
's context length limit with langchain'sTokenTextSplitter
.
pandocprompt
$ pandocprompt --help
usage: pandocprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT] [-C]
[-w WHAT] [-M] [--from PANDOC_FROM_FORMAT]
[--to PANDOC_TO_FORMAT]
[PATH ...]
Get prompts from arbitrary files. You need to have `pandoc` installed and in
$PATH, it will be used to convert source files to desired (hopefully textual)
format. Common use cases: Getting prompts from EPub books or several TeX
files.
positional arguments:
PATH Paths to the text files, or stdin if not provided
(default: None)
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)
-C, --from-clipboard Load text from clipboard (default: False)
-w WHAT, --what WHAT Initial knowledge you want to insert before the PDF
content in the prompt (default: the content of a
document)
-M, --merge Merge contents of all pages before processing
(default: False)
--from PANDOC_FROM_FORMAT
The format that is passed to -f in pandoc (default:
None)
--to PANDOC_TO_FORMAT
The format that is passed to -t in pandoc. gfm-
raw_html means GitHub Flavored Markdown with raw HTML
stripped. (default: gfm-raw_html)
urlprompt
$ urlprompt --help
usage: urlprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT] [-w WHAT]
[-M] [-j] [-g] [--github-path GITHUB_PATH]
[--github-revision GITHUB_REVISION] [--substack]
URL
Get a prompt consisting the text content of a webpage
positional arguments:
URL URL to the webpage
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)
-w WHAT, --what WHAT Initial knowledge you want to insert before the PDF
content in the prompt (default: the content of a
webpage)
-M, --merge Merge contents of all pages before processing
(default: False)
-j, --javascript Use JavaScript to render the page (default: False)
-g, --github Load the raw file from a GitHub URL (default: False)
--github-path GITHUB_PATH
Path to the GitHub file (default: README.md)
--github-revision GITHUB_REVISION
Revision for the GitHub file (default: master)
--substack Load from a Substack URL and convert it to Markdown
(default: False)
pdfprompt
$ pdfprompt --help
usage: pdfprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT]
[-p PAGES [PAGES ...]] [-l PAGE_SLICE] [-M] [-w WHAT] [-o]
[-O] [-L OCR_LANGUAGE]
PDF Path
Get a prompt consisting the text content of a PDF file
positional arguments:
PDF Path Path to the PDF file
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)
-p PAGES [PAGES ...], --pages PAGES [PAGES ...]
Only include specified page numbers (default: None)
-l PAGE_SLICE, --page-slice PAGE_SLICE
Use Python slice syntax to select page numbers (e.g.
1:3, 1:10:2, etc.) (default: None)
-M, --merge Merge contents of all pages before processing
(default: False)
-w WHAT, --what WHAT Initial knowledge you want to insert before the PDF
content in the prompt (default: the content of a PDF
file)
-o, --fallback-ocr Use OCR as fallback if no text detected on page,
please set TESSDATA_PREFIX environment variable to the
path of your tesseract data directory (default: False)
-O, --force-ocr Force OCR on all pages (default: False)
-L OCR_LANGUAGE, --ocr-language OCR_LANGUAGE
Language to use for Tesseract OCR (like eng, chi_sim,
chi_tra, chi_tra_vert etc.)) (default: eng)
ytprompt
$ ytprompt --help
usage: ytprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT]
YouTube URL
Get a prompt consisting Title and Transcript of a YouTube Video
positional arguments:
YouTube URL YouTube URL
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)
textprompt
$ textprompt --help
usage: textprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT] [-C]
[-w WHAT] [-M]
[PATH ...]
Get a prompt from text files
positional arguments:
PATH Paths to the text files, or stdin if not provided
(default: None)
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)
-C, --from-clipboard Load text from clipboard (default: False)
-w WHAT, --what WHAT Initial knowledge you want to insert before the PDF
content in the prompt (default: the content of a
document)
-M, --merge Merge contents of all pages before processing
(default: False)
htmlprompt
$ htmlprompt --help
usage: htmlprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
[-P PARTS [PARTS ...]] [-r] [-R]
[--print-percentage-non-ascii] [-n] [--out OUT] [-C]
[-w WHAT] [-M]
[PATH ...]
Get a prompt from html files
positional arguments:
PATH Paths to the html files, or stdin if not provided
(default: None)
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-c, --copy Copy the prompt to clipboard (default: False)
-e, --edit Edit the prompt and copy manually (default: False)
-m model, --model model
Model to use. This only affects the chunk size. Use -S
to disable splitting (infinite chunk size). (default:
gpt-4-32k)
-S, --no-split Do not split the prompt into multiple parts (use this
if the model has a really large context size)
(default: False)
-s chunk_size, --chunk-size chunk_size
Chunk size when splitting transcript, also used to
determine whether to split, defaults to 1/2 of the
context length limit of the model (default: None)
-P PARTS [PARTS ...], --parts PARTS [PARTS ...]
Parts to select in the processes list of Documents
(default: None)
-r, --raw Wraps the content in triple quotes with no extra text
(default: False)
-R, --raw-no-quotes Output the content only (default: False)
--print-percentage-non-ascii
Print percentage of non-ascii characters (default:
False)
-n, --dry-run Dry run (default: False)
--out OUT Output file (default: None)
-C, --from-clipboard Load text from clipboard (default: False)
-w WHAT, --what WHAT Initial knowledge you want to insert before the PDF
content in the prompt (default: the text content of a
html file)
-M, --merge Merge contents of all pages before processing
(default: False)
Installation
pipx
This is the recommended installation method.
$ pipx install langchain-utils
pip
$ pip install langchain-utils
Develop
$ git clone https://github.com/tddschn/langchain-utils.git
$ cd langchain-utils
$ poetry install
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file langchain_utils-0.8.0.tar.gz
.
File metadata
- Download URL: langchain_utils-0.8.0.tar.gz
- Upload date:
- Size: 22.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.12.3 Darwin/23.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73efaf790266acd9deb6105239e9937920e72fa2c2366827094b617b49a4d443 |
|
MD5 | c8e69d7ee0bb7756548fbfaf35aaa575 |
|
BLAKE2b-256 | f17f93f5a879414f8755811405fd90595d6116b2c9abcd11b355fe6ea8d59824 |
File details
Details for the file langchain_utils-0.8.0-py3-none-any.whl
.
File metadata
- Download URL: langchain_utils-0.8.0-py3-none-any.whl
- Upload date:
- Size: 31.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.12.3 Darwin/23.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dff1b76a58de8ac67a51380a1bd16e26a4868d4eaa0944ccc20a5eb5b021ee43 |
|
MD5 | 24b58db6ad99c70464b977c4783abeaf |
|
BLAKE2b-256 | 0ced281e4deb22b99a6315760a1455db32177a23f260c598194448a04b7cb13d |