Utilities built upon the langchain library

These details have not been verified by PyPI

Project links

Project description

langchain-utils

LangChain Utilities

langchain-utils

Prompt generation using LangChain document loaders

Do you find yourself frequently copy-pasting texts from the web / PDFs / other documents into ChatGPT?

If yes, these tools are for you!

Optimized to feed into a chat interface (like ChatGPT) manually in one or multiple (to get around context length limits) goes.

Basically, the prompts generated look like this:

REPLY_OK_IF_YOU_READ_TEMPLATE = '''
Below is {what}, reply "OK" if you read:

"""
{content}
"""
'''.strip()

You can feed it directly to a chat interface like ChatGPT, and ask follow up questions about it.

See prompts.py for other variations.

Demos

Loading https://github.com/tddschn/langchain-utils and copy to clipboard:

Load 3 pages of a pdf file, open each part for inspection before copying, and optionally merge 3 pages into 2 prompts that wouldn't go over the gpt-3.5-turbo's context length limit with langchain's TokenTextSplitter.

`pandocprompt`

$ pandocprompt --help

usage: pandocprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                    [-P PARTS [PARTS ...]] [-r] [-R]
                    [--print-percentage-non-ascii] [-n] [--out OUT] [-C]
                    [-w WHAT] [-M] [--from PANDOC_FROM_FORMAT]
                    [--to PANDOC_TO_FORMAT]
                    [PATH ...]

Get prompts from arbitrary files. You need to have `pandoc` installed and in
$PATH, it will be used to convert source files to desired (hopefully textual)
format. Common use cases: Getting prompts from EPub books or several TeX
files.

positional arguments:
  PATH                  Paths to the text files, or stdin if not provided
                        (default: None)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)
  -C, --from-clipboard  Load text from clipboard (default: False)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the content of a
                        document)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)
  --from PANDOC_FROM_FORMAT
                        The format that is passed to -f in pandoc (default:
                        None)
  --to PANDOC_TO_FORMAT
                        The format that is passed to -t in pandoc. gfm-
                        raw_html means GitHub Flavored Markdown with raw HTML
                        stripped. (default: gfm-raw_html)

`urlprompt`

$ urlprompt --help

usage: urlprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                 [-P PARTS [PARTS ...]] [-r] [-R]
                 [--print-percentage-non-ascii] [-n] [--out OUT] [-w WHAT]
                 [-M] [-j] [-g] [--github-path GITHUB_PATH]
                 [--github-revision GITHUB_REVISION] [--substack]
                 URL

Get a prompt consisting the text content of a webpage

positional arguments:
  URL                   URL to the webpage

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the content of a
                        webpage)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)
  -j, --javascript      Use JavaScript to render the page (default: False)
  -g, --github          Load the raw file from a GitHub URL (default: False)
  --github-path GITHUB_PATH
                        Path to the GitHub file (default: README.md)
  --github-revision GITHUB_REVISION
                        Revision for the GitHub file (default: master)
  --substack            Load from a Substack URL and convert it to Markdown
                        (default: False)

`pdfprompt`

$ pdfprompt --help

usage: pdfprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                 [-P PARTS [PARTS ...]] [-r] [-R]
                 [--print-percentage-non-ascii] [-n] [--out OUT]
                 [-p PAGES [PAGES ...]] [-l PAGE_SLICE] [-M] [-w WHAT] [-o]
                 [-O] [-L OCR_LANGUAGE]
                 PDF Path

Get a prompt consisting the text content of a PDF file

positional arguments:
  PDF Path              Path to the PDF file

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)
  -p PAGES [PAGES ...], --pages PAGES [PAGES ...]
                        Only include specified page numbers (default: None)
  -l PAGE_SLICE, --page-slice PAGE_SLICE
                        Use Python slice syntax to select page numbers (e.g.
                        1:3, 1:10:2, etc.) (default: None)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the content of a PDF
                        file)
  -o, --fallback-ocr    Use OCR as fallback if no text detected on page,
                        please set TESSDATA_PREFIX environment variable to the
                        path of your tesseract data directory (default: False)
  -O, --force-ocr       Force OCR on all pages (default: False)
  -L OCR_LANGUAGE, --ocr-language OCR_LANGUAGE
                        Language to use for Tesseract OCR (like eng, chi_sim,
                        chi_tra, chi_tra_vert etc.)) (default: eng)

`ytprompt`

$ ytprompt --help

usage: ytprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                [-P PARTS [PARTS ...]] [-r] [-R]
                [--print-percentage-non-ascii] [-n] [--out OUT]
                YouTube URL

Get a prompt consisting Title and Transcript of a YouTube Video

positional arguments:
  YouTube URL           YouTube URL

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)

`textprompt`

$ textprompt --help

usage: textprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                  [-P PARTS [PARTS ...]] [-r] [-R]
                  [--print-percentage-non-ascii] [-n] [--out OUT] [-C]
                  [-w WHAT] [-M]
                  [PATH ...]

Get a prompt from text files

positional arguments:
  PATH                  Paths to the text files, or stdin if not provided
                        (default: None)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)
  -C, --from-clipboard  Load text from clipboard (default: False)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the content of a
                        document)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)

`htmlprompt`

$ htmlprompt --help

usage: htmlprompt [-h] [-V] [-c] [-e] [-m model] [-S] [-s chunk_size]
                  [-P PARTS [PARTS ...]] [-r] [-R]
                  [--print-percentage-non-ascii] [-n] [--out OUT] [-C]
                  [-w WHAT] [-M]
                  [PATH ...]

Get a prompt from html files

positional arguments:
  PATH                  Paths to the html files, or stdin if not provided
                        (default: None)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -c, --copy            Copy the prompt to clipboard (default: False)
  -e, --edit            Edit the prompt and copy manually (default: False)
  -m model, --model model
                        Model to use. This only affects the chunk size. Use -S
                        to disable splitting (infinite chunk size). (default:
                        gpt-4-32k)
  -S, --no-split        Do not split the prompt into multiple parts (use this
                        if the model has a really large context size)
                        (default: False)
  -s chunk_size, --chunk-size chunk_size
                        Chunk size when splitting transcript, also used to
                        determine whether to split, defaults to 1/2 of the
                        context length limit of the model (default: None)
  -P PARTS [PARTS ...], --parts PARTS [PARTS ...]
                        Parts to select in the processes list of Documents
                        (default: None)
  -r, --raw             Wraps the content in triple quotes with no extra text
                        (default: False)
  -R, --raw-no-quotes   Output the content only (default: False)
  --print-percentage-non-ascii
                        Print percentage of non-ascii characters (default:
                        False)
  -n, --dry-run         Dry run (default: False)
  --out OUT             Output file (default: None)
  -C, --from-clipboard  Load text from clipboard (default: False)
  -w WHAT, --what WHAT  Initial knowledge you want to insert before the PDF
                        content in the prompt (default: the text content of a
                        html file)
  -M, --merge           Merge contents of all pages before processing
                        (default: False)

Installation

pipx

This is the recommended installation method.

$ pipx install langchain-utils

pip

$ pip install langchain-utils

Develop

$ git clone https://github.com/tddschn/langchain-utils.git
$ cd langchain-utils
$ poetry install

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.8.0

Jun 5, 2024

0.7.2

May 22, 2024

0.7.1

May 22, 2024

0.6.0

Feb 2, 2024

0.5.7

Dec 19, 2023

0.5.6

Dec 9, 2023

0.5.5

Dec 9, 2023

0.5.4

Oct 23, 2023

0.5.3

Oct 14, 2023

0.5.2

Oct 11, 2023

0.5.1

Oct 11, 2023

0.5.0

Oct 11, 2023

0.4.4

Jun 4, 2023

0.4.3

May 31, 2023

0.4.2

May 29, 2023

0.4.1

May 29, 2023

0.4.0

May 29, 2023

0.3.25

May 23, 2023

0.3.24

May 3, 2023

0.3.23

May 3, 2023

0.3.22

Apr 17, 2023

0.3.21

Apr 17, 2023

0.3.20

Apr 17, 2023

0.3.19

Apr 17, 2023

0.3.18

Apr 16, 2023

0.3.17

Apr 16, 2023

0.3.16

Apr 13, 2023

0.3.15

Apr 13, 2023

0.3.14

Apr 13, 2023

0.3.13

Apr 13, 2023

0.3.12

Apr 13, 2023

0.3.11

Apr 12, 2023

0.3.10

Apr 12, 2023

0.3.9

Apr 12, 2023

0.3.8

Apr 12, 2023

0.3.7

Apr 12, 2023

0.3.6

Apr 12, 2023

0.3.5

Apr 12, 2023

0.3.4

Apr 10, 2023

0.3.3

Apr 10, 2023

0.3.2

Apr 10, 2023

0.3.1

Apr 10, 2023

0.3.0

Apr 10, 2023

0.2.2

Apr 10, 2023

0.2.1

Apr 10, 2023

0.2.0

Apr 10, 2023

0.1.8

Apr 10, 2023

0.1.7

Apr 10, 2023

0.1.6

Apr 10, 2023

0.1.5

Apr 10, 2023

0.1.4

Apr 10, 2023

0.1.3

Apr 9, 2023

0.1.2

Apr 9, 2023

0.1.1

Apr 9, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_utils-0.8.0.tar.gz (22.2 kB view details)

Uploaded Jun 5, 2024 Source

Built Distribution

langchain_utils-0.8.0-py3-none-any.whl (31.2 kB view details)

Uploaded Jun 5, 2024 Python 3

File details

Details for the file langchain_utils-0.8.0.tar.gz.

File metadata

Download URL: langchain_utils-0.8.0.tar.gz
Upload date: Jun 5, 2024
Size: 22.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.7.1 CPython/3.12.3 Darwin/23.4.0

File hashes

Hashes for langchain_utils-0.8.0.tar.gz
Algorithm	Hash digest
SHA256	`73efaf790266acd9deb6105239e9937920e72fa2c2366827094b617b49a4d443`
MD5	`c8e69d7ee0bb7756548fbfaf35aaa575`
BLAKE2b-256	`f17f93f5a879414f8755811405fd90595d6116b2c9abcd11b355fe6ea8d59824`

See more details on using hashes here.

File details

Details for the file langchain_utils-0.8.0-py3-none-any.whl.

File metadata

Download URL: langchain_utils-0.8.0-py3-none-any.whl
Upload date: Jun 5, 2024
Size: 31.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.7.1 CPython/3.12.3 Darwin/23.4.0

File hashes

Hashes for langchain_utils-0.8.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dff1b76a58de8ac67a51380a1bd16e26a4868d4eaa0944ccc20a5eb5b021ee43`
MD5	`24b58db6ad99c70464b977c4783abeaf`
BLAKE2b-256	`0ced281e4deb22b99a6315760a1455db32177a23f260c598194448a04b7cb13d`

See more details on using hashes here.

langchain-utils 0.8.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

langchain-utils

Prompt generation using LangChain document loaders

Demos

`pandocprompt`

`urlprompt`

`pdfprompt`

`ytprompt`

`textprompt`

`htmlprompt`

Installation

pipx

pip

Develop

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes