Skip to main content

LLM prompt/context preparation utility

Project description

contextualize

contextualize is a package to quickly retrieve and format file contents for use with LLMs.

Installation

You can install the package using pip:

pip install contextualize

or pipx for using the CLI globally:

pipx install contextualize

Usage (reference.py)

Define FileReference objects for specified file paths and optional ranges.

  • set range to a tuple of line numbers to include only a portion of the file, e.g. range=(1, 10)
  • set format to "md" (default) or "xml" to wrap file contents in Markdown code blocks or <file> tags
  • set label to "relative" (default), "name", or "ext" to determine what label is affixed to the enclosing Markdown/XML string
    • "relative" will use the relative path from the current working directory
    • "name" will use the file name only
    • "ext" will use the file extension only

Retrieve wrapped contents from the output attribute.

CLI

A CLI (cli.py) is provided to print file contents to the console from the command line.

  • cat: Prepare and concatenate file references
    • paths: Positional arguments for target file(s) or directories
    • --ignore: File(s) to ignore (optional)
    • --format: Output format (md or xml, default is md)
    • --label: Label style (relative for relative file path, name for file name only, ext for file extension only; default is relative)
    • --output: Output target (console (default), clipboard)
    • --output-file: Output file path (optional, compatible with --output clipboard)
  • ls: List token counts
    • paths: Positional arguments for target file(s) or directories
    • --encoding: Encoding to use for tokenization, e.g., cl100k_base (default), p50k_base, r50k_base
    • --model: Model (e.g., gpt-3.5-turbo/gpt-4 (default), text-davinci-003, code-davinci-002) to determine which encoding to use for tokenization. Not used if encoding is provided.

Examples

  • cat:
    • contextualize cat README.md will print the wrapped contents of README.md to the console with default settings (Markdown format, relative path label).
    • contextualize cat README.md --format xml will print the wrapped contents of README.md to the console with XML format.
    • contextualize cat contextualize/ dev/ README.md --format xml will prepare file references for files in the contextualize/ and dev/ directories and README.md, and print each file's contents (wrapped in corresponding XML tags) to the console.
  • ls:
    • contextualize ls README.md will count and print the number of tokens in README.md using the default cl100k_base encoding.
    • contextualize ls contextualize/ --model text-davinci-003 will count and print the number of tokens in each file in the contextualize/ directory using the p50k_base encoding associated with the text-davinci-003 model, then print the total tokens for all processed files.

Related projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextualize-0.0.3.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

contextualize-0.0.3-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file contextualize-0.0.3.tar.gz.

File metadata

  • Download URL: contextualize-0.0.3.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.6

File hashes

Hashes for contextualize-0.0.3.tar.gz
Algorithm Hash digest
SHA256 7ea987012fb6c35d2123415a621817b34565fcaa84ec512575007b8624d3426d
MD5 d5d89fae984ff1b0a398c15d24150359
BLAKE2b-256 bd624f2336887b4e8453b6d9284fcc61f4f4b611a69d58cdc5eb30bc6e562716

See more details on using hashes here.

File details

Details for the file contextualize-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for contextualize-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0e0f76e342b435ba840f946f80e5ccbbcb3305d7a41c165a1ae6e5679026aa9d
MD5 11ae024e0df32ac4ea00ca329351969f
BLAKE2b-256 eccd7646787d831414436499fdf7533eeacb62005c9622b4f48f4787b7983cce

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page