Skip to main content

Utilities built upon the langchain library

Project description

langchain-utils

LangChain Utilities

Prompt generation using LangChain document loaders

Do you find yourself frequently copy-pasting texts from the web / PDFs / other documents into ChatGPT?

If yes, these tools are for you!

Optimized to feed into a chat interface (like ChatGPT) manually in one or multiple (to get around context length limits) goes.

Basically, the prompts generated look like this:

REPLY_OK_IF_YOU_READ_TEMPLATE = '''
Below is {what}, reply "OK" if you read:

"""
{content}
"""
'''.strip()

You can feed it directly to a chat interface like ChatGPT, and ask follow up questions about it.

See prompts.py for other variations.

Demos

  • Loading https://github.com/tddschn/langchain-utils and copy to clipboard:

  • Load 3 pages of a pdf file, open each part for inspection before copying, and optionally merge 3 pages into 2 prompts that wouldn't go over the gpt-3.5-turbo's context length limit with langchain's TokenTextSplitter.

pandocprompt

$ pandocprompt --help

usage: pandocprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                    [-P PARTS [PARTS ...]] [-r] [-R]
                    [--print-percentage-non-ascii] [-n] [--out OUT] [-C]
                    [-w WHAT] [-M] [--from PANDOC_FROM_FORMAT]
                    [--to PANDOC_TO_FORMAT]
                    [PATH ...]

Get prompts from arbitrary files. You need to have `pandoc` installed and in
$PATH, it will be used to convert source files to desired (hopefully textual)
format. Common use cases: Getting prompts from EPub books or several TeX
files.

positional arguments:
  PATH                  Paths to the text files, or stdin if not provided
                        (default: None)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)
  -C, --from-clipboard  Load text from clipboard (default: False)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the content of a
                        document)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)
  --from PANDOC_FROM_FORMAT
                        The format that is passed to -f in pandoc (default:
                        None)
  --to PANDOC_TO_FORMAT
                        The format that is passed to -t in pandoc. gfm-
                        raw_html means GitHub Flavored Markdown with raw HTML
                        stripped. (default: gfm-raw_html)

urlprompt

$ urlprompt --help

usage: urlprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                 [-P PARTS [PARTS ...]] [-r] [-R]
                 [--print-percentage-non-ascii] [-n] [--out OUT] [-w WHAT]
                 [-M] [-j] [-g] [--github-path GITHUB_PATH]
                 [--github-revision GITHUB_REVISION] [--substack]
                 URL

Get a prompt consisting the text content of a webpage

positional arguments:
  URL                   URL to the webpage

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the content of a
                        webpage)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)
  -j, --javascript      Use JavaScript to render the page (default: False)
  -g, --github          Load the raw file from a GitHub URL (default: False)
  --github-path GITHUB_PATH
                        Path to the GitHub file (default: README.md)
  --github-revision GITHUB_REVISION
                        Revision for the GitHub file (default: master)
  --substack            Load from a Substack URL and convert it to Markdown
                        (default: False)

pdfprompt

$ pdfprompt --help

usage: pdfprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                 [-P PARTS [PARTS ...]] [-r] [-R]
                 [--print-percentage-non-ascii] [-n] [--out OUT]
                 [-p PAGES [PAGES ...]] [-l PAGE_SLICE] [-M] [-w WHAT] [-o]
                 [-O] [-L OCR_LANGUAGE]
                 PDF Path

Get a prompt consisting the text content of a PDF file

positional arguments:
  PDF Path              Path to the PDF file

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)
  -p PAGES [PAGES ...], --pages PAGES [PAGES ...]
                        Only include specified page numbers (default: None)
  -l PAGE_SLICE, --page-slice PAGE_SLICE
                        Use Python slice syntax to select page numbers (e.g.
                        1:3, 1:10:2, etc.) (default: None)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the content of a PDF
                        file)
  -o, --fallback-ocr    Use OCR as fallback if no text detected on page,
                        please set TESSDATA_PREFIX environment variable to the
                        path of your tesseract data directory (default: False)
  -O, --force-ocr       Force OCR on all pages (default: False)
  -L OCR_LANGUAGE, --ocr-language OCR_LANGUAGE
                        Language to use for Tesseract OCR (like eng, chi_sim,
                        chi_tra, chi_tra_vert etc.)) (default: eng)

ytprompt

$ ytprompt --help

usage: ytprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                [-P PARTS [PARTS ...]] [-r] [-R]
                [--print-percentage-non-ascii] [-n] [--out OUT]
                YouTube URL

Get a prompt consisting Title and Transcript of a YouTube Video

positional arguments:
  YouTube URL           YouTube URL

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)

textprompt

$ textprompt --help

usage: textprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                  [-P PARTS [PARTS ...]] [-r] [-R]
                  [--print-percentage-non-ascii] [-n] [--out OUT] [-C]
                  [-w WHAT] [-M]
                  [PATH ...]

Get a prompt from text files

positional arguments:
  PATH                  Paths to the text files, or stdin if not provided
                        (default: None)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)
  -C, --from-clipboard  Load text from clipboard (default: False)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the content of a
                        document)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)

htmlprompt

$ htmlprompt --help

usage: htmlprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                  [-P PARTS [PARTS ...]] [-r] [-R]
                  [--print-percentage-non-ascii] [-n] [--out OUT] [-C]
                  [-w WHAT] [-M]
                  [PATH ...]

Get a prompt from html files

positional arguments:
  PATH                  Paths to the html files, or stdin if not provided
                        (default: None)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)
  -C, --from-clipboard  Load text from clipboard (default: False)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the text content of a
                        html file)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)

Installation

pipx

This is the recommended installation method.

$ pipx install langchain-utils

pip

$ pip install langchain-utils

Develop

$ git clone https://github.com/tddschn/langchain-utils.git
$ cd langchain-utils
$ poetry install

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_utils-0.8.0.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

langchain_utils-0.8.0-py3-none-any.whl (31.2 kB view details)

Uploaded Python 3

File details

Details for the file langchain_utils-0.8.0.tar.gz.

File metadata

  • Download URL: langchain_utils-0.8.0.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.12.3 Darwin/23.4.0

File hashes

Hashes for langchain_utils-0.8.0.tar.gz
Algorithm Hash digest
SHA256 73efaf790266acd9deb6105239e9937920e72fa2c2366827094b617b49a4d443
MD5 c8e69d7ee0bb7756548fbfaf35aaa575
BLAKE2b-256 f17f93f5a879414f8755811405fd90595d6116b2c9abcd11b355fe6ea8d59824

See more details on using hashes here.

File details

Details for the file langchain_utils-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: langchain_utils-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 31.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.12.3 Darwin/23.4.0

File hashes

Hashes for langchain_utils-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dff1b76a58de8ac67a51380a1bd16e26a4868d4eaa0944ccc20a5eb5b021ee43
MD5 24b58db6ad99c70464b977c4783abeaf
BLAKE2b-256 0ced281e4deb22b99a6315760a1455db32177a23f260c598194448a04b7cb13d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page