Utilities built upon the langchain library

These details have not been verified by PyPI

Project links

Project description

langchain-utils

LangChain Utilities

langchain-utils

Prompt generation using LangChain document loaders

Do you find yourself frequently copy-pasting texts from the web / PDFs / other documents into ChatGPT?

If yes, these tools are for you!

Optimized to feed into a chat interface (like ChatGPT) manually in one or multiple (to get around context length limits) goes.

Basically, the prompts generated look like this:

REPLY_OK_IF_YOU_READ_TEMPLATE = '''
Below is {what}, reply "OK" if you read:

"""
{content}
"""
'''.strip()

You can feed it directly to a chat interface like ChatGPT, and ask follow up questions about it.

See prompts.py for other variations.

Demos

Loading https://github.com/tddschn/langchain-utils and copy to clipboard:

Load 3 pages of a pdf file, open each part for inspection before copying, and optionally merge 3 pages into 2 prompts that wouldn't go over the gpt-3.5-turbo's context length limit with langchain's TokenTextSplitter.

`urlprompt`

$ urlprompt --help

usage: urlprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                 [-P PARTS [PARTS ...]] [-r] [-R]
                 [--print-percentage-non-ascii] [-n] [-w WHAT] [-M] [-j] [-g]
                 [--github-path GITHUB_PATH]
                 [--github-revision GITHUB_REVISION]
                 URL

Get a prompt consisting the text content of a webpage

positional arguments:
  URL                   URL to the webpage

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use (default: gpt-3.5-turbo)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the content of a
                        webpage)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)
  -j, --javascript      Use JavaScript to render the page (default: False)
  -g, --github          Load the raw file from a GitHub URL (default: False)
  --github-path GITHUB_PATH
                        Path to the GitHub file (default: README.md)
  --github-revision GITHUB_REVISION
                        Revision for the GitHub file (default: master)

`pdfprompt`

$ pdfprompt --help

usage: pdfprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                 [-P PARTS [PARTS ...]] [-r] [-R]
                 [--print-percentage-non-ascii] [-n] [-p PAGES [PAGES ...]]
                 [-l PAGE_SLICE] [-M] [-w WHAT] [-o] [-L OCR_LANGUAGE]
                 PDF Path

Get a prompt consisting the text content of a PDF file

positional arguments:
  PDF Path              Path to the PDF file

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use (default: gpt-3.5-turbo)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  -p PAGES [PAGES ...], --pages PAGES [PAGES ...]
                        Only include specified page numbers (default: None)
  -l PAGE_SLICE, --page-slice PAGE_SLICE
                        Use Python slice syntax to select page numbers (e.g.
                        1:3, 1:10:2, etc.) (default: None)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the content of a PDF
                        file)
  -o, --fallback-ocr    Use OCR as fallback if no text detected on page,
                        please set TESSDATA_PREFIX environment variable to the
                        path of your tesseract data directory (default: False)
  -L OCR_LANGUAGE, --ocr-language OCR_LANGUAGE
                        Language to use for Tesseract OCR (default: chi_sim)

`ytprompt`

$ ytprompt --help

usage: ytprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                [-P PARTS [PARTS ...]] [-r] [-R]
                [--print-percentage-non-ascii] [-n]
                YouTube URL

Get a prompt consisting Title and Transcript of a YouTube Video

positional arguments:
  YouTube URL           YouTube URL

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use (default: gpt-3.5-turbo)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)

`textprompt`

$ textprompt --help

usage: textprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                  [-P PARTS [PARTS ...]] [-r] [-R]
                  [--print-percentage-non-ascii] [-n] [-C] [-w WHAT] [-M]
                  [PATH ...]

Get a prompt from text files

positional arguments:
  PATH                  Paths to the text files, or stdin if not provided
                        (default: None)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use (default: gpt-3.5-turbo)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  -C, --from-clipboard  Load text from clipboard (default: False)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the content of a
                        document)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)

`htmlprompt`

$ htmlprompt --help

usage: htmlprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                  [-P PARTS [PARTS ...]] [-r] [-R]
                  [--print-percentage-non-ascii] [-n] [-C] [-w WHAT] [-M]
                  [PATH ...]

Get a prompt from html files

positional arguments:
  PATH                  Paths to the html files, or stdin if not provided
                        (default: None)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use (default: gpt-3.5-turbo)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  -C, --from-clipboard  Load text from clipboard (default: False)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the text content of a
                        html file)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)

Installation

pipx

This is the recommended installation method.

$ pipx install langchain-utils

pip

$ pip install langchain-utils

Develop

$ git clone https://github.com/tddschn/langchain-utils.git
$ cd langchain-utils
$ poetry install

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.9.0

Sep 15, 2025

0.8.0

Jun 5, 2024

0.7.2

May 22, 2024

0.7.1

May 22, 2024

0.6.0

Feb 2, 2024

0.5.7

Dec 19, 2023

0.5.6

Dec 9, 2023

0.5.5

Dec 9, 2023

0.5.4

Oct 23, 2023

0.5.3

Oct 14, 2023

0.5.2

Oct 11, 2023

0.5.1

Oct 11, 2023

0.5.0

Oct 11, 2023

This version

0.4.4

Jun 4, 2023

0.4.3

May 31, 2023

0.4.2

May 29, 2023

0.4.1

May 29, 2023

0.4.0

May 29, 2023

0.3.25

May 23, 2023

0.3.24

May 3, 2023

0.3.23

May 3, 2023

0.3.22

Apr 17, 2023

0.3.21

Apr 17, 2023

0.3.20

Apr 17, 2023

0.3.19

Apr 17, 2023

0.3.18

Apr 16, 2023

0.3.17

Apr 16, 2023

0.3.16

Apr 13, 2023

0.3.15

Apr 13, 2023

0.3.14

Apr 13, 2023

0.3.13

Apr 13, 2023

0.3.12

Apr 13, 2023

0.3.11

Apr 12, 2023

0.3.10

Apr 12, 2023

0.3.9

Apr 12, 2023

0.3.8

Apr 12, 2023

0.3.7

Apr 12, 2023

0.3.6

Apr 12, 2023

0.3.5

Apr 12, 2023

0.3.4

Apr 10, 2023

0.3.3

Apr 10, 2023

0.3.2

Apr 10, 2023

0.3.1

Apr 10, 2023

0.3.0

Apr 10, 2023

0.2.2

Apr 10, 2023

0.2.1

Apr 10, 2023

0.2.0

Apr 10, 2023

0.1.8

Apr 10, 2023

0.1.7

Apr 10, 2023

0.1.6

Apr 10, 2023

0.1.5

Apr 10, 2023

0.1.4

Apr 10, 2023

0.1.3

Apr 9, 2023

0.1.2

Apr 9, 2023

0.1.1

Apr 9, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_utils-0.4.4.tar.gz (17.0 kB view details)

Uploaded Jun 4, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langchain_utils-0.4.4-py3-none-any.whl (23.9 kB view details)

Uploaded Jun 4, 2023 Python 3

File details

Details for the file langchain_utils-0.4.4.tar.gz.

File metadata

Download URL: langchain_utils-0.4.4.tar.gz
Upload date: Jun 4, 2023
Size: 17.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.11.3 Darwin/22.5.0

File hashes

Hashes for langchain_utils-0.4.4.tar.gz
Algorithm	Hash digest
SHA256	`10144f184c112527c4393279c56b3e0da98df9e3f81a7588fef81640725808bb`
MD5	`167c937828c62ccd14be3ed12fcab4e1`
BLAKE2b-256	`a44957b10c5749178d35aecd993812f01cfcd60e9f69017333f63403f89ddaa1`

See more details on using hashes here.

File details

Details for the file langchain_utils-0.4.4-py3-none-any.whl.

File metadata

Download URL: langchain_utils-0.4.4-py3-none-any.whl
Upload date: Jun 4, 2023
Size: 23.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.2 CPython/3.11.3 Darwin/22.5.0

File hashes

Hashes for langchain_utils-0.4.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5aedfe78bb1697ce60f6c45861716124dfd301ad6021f3476c71e683c01b3d56`
MD5	`7cff6f0a16b57bc890d7bb64ab1440d5`
BLAKE2b-256	`3ec083da91cb12f132cb14ccc03666b6c46b4e39630ea93f12c16c32ee5dee1c`

See more details on using hashes here.

langchain-utils 0.4.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

langchain-utils

Prompt generation using LangChain document loaders

Demos

`urlprompt`

`pdfprompt`

`ytprompt`

`textprompt`

`htmlprompt`

Installation

pipx

pip

Develop

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes