LLM prompt/context preparation utility

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

contextualize

contextualize is a package to quickly retrieve and format file contents for use with LLMs.

Installation

You can install the package using pip:

pip install contextualize

or pipx for using the CLI globally:

pipx install contextualize

Usage (`reference.py`)

Define FileReference objects for specified file paths and optional ranges.

set range to a tuple of line numbers to include only a portion of the file, e.g. range=(1, 10)
set format to "md" (default) or "xml" to wrap file contents in Markdown code blocks or <file> tags
set label to "relative" (default), "name", or "ext" to determine what label is affixed to the enclosing Markdown/XML string
- "relative" will use the relative path from the current working directory
- "name" will use the file name only
- "ext" will use the file extension only

Retrieve wrapped contents from the output attribute.

CLI

A CLI (cli.py) is provided to print file contents to the console from the command line.

cat: Prepare and concatenate file references
- paths: Positional arguments for target file(s) or directories
- --ignore: File(s) to ignore (optional)
- --format: Output format (md or xml, default is md)
- --label: Label style (relative for relative file path, name for file name only, ext for file extension only; default is relative)
- --output: Output target (console (default), clipboard)
- --output-file: Output file path (optional, compatible with --output clipboard)
ls: List token counts
- paths: Positional arguments for target file(s) or directories
- --encoding: Encoding to use for tokenization, e.g., cl100k_base (default), p50k_base, r50k_base
- --model: Model (e.g., gpt-3.5-turbo/gpt-4 (default), text-davinci-003, code-davinci-002) to determine which encoding to use for tokenization. Not used if encoding is provided.

Examples

cat:
- contextualize cat README.md will print the wrapped contents of README.md to the console with default settings (Markdown format, relative path label).
- contextualize cat README.md --format xml will print the wrapped contents of README.md to the console with XML format.
- contextualize cat contextualize/ dev/ README.md --format xml will prepare file references for files in the contextualize/ and dev/ directories and README.md, and print each file's contents (wrapped in corresponding XML tags) to the console.
ls:
- contextualize ls README.md will count and print the number of tokens in README.md using the default cl100k_base encoding.
- contextualize ls contextualize/ --model text-davinci-003 will count and print the number of tokens in each file in the contextualize/ directory using the p50k_base encoding associated with the text-davinci-003 model, then print the total tokens for all processed files.

Related projects

lumpenspace/jamall

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.3

Apr 18, 2024

0.0.2

Mar 21, 2024

0.0.1

Mar 21, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextualize-0.0.3.tar.gz (9.0 kB view hashes)

Uploaded Apr 18, 2024 Source

Built Distribution

contextualize-0.0.3-py3-none-any.whl (9.3 kB view hashes)

Uploaded Apr 18, 2024 Python 3

Hashes for contextualize-0.0.3.tar.gz

Hashes for contextualize-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`7ea987012fb6c35d2123415a621817b34565fcaa84ec512575007b8624d3426d`
MD5	`d5d89fae984ff1b0a398c15d24150359`
BLAKE2b-256	`bd624f2336887b4e8453b6d9284fcc61f4f4b611a69d58cdc5eb30bc6e562716`

Hashes for contextualize-0.0.3-py3-none-any.whl

Hashes for contextualize-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0e0f76e342b435ba840f946f80e5ccbbcb3305d7a41c165a1ae6e5679026aa9d`
MD5	`11ae024e0df32ac4ea00ca329351969f`
BLAKE2b-256	`eccd7646787d831414436499fdf7533eeacb62005c9622b4f48f4787b7983cce`

contextualize 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Project description

contextualize

Installation

Usage (`reference.py`)