Skip to main content

Interactively explore a codebase with an LLM

Project description

explore

Interactively explore a codebase with an LLM

PyPI - Version

explore is a script to interactively explore a codebase by chatting with an LLM. It uses retrieval-augmented generation via chromadb to provide the LLM with relevant source code from the codebase.

explore uses OpenAI models by default, so you'll need an OpenAI API key.

Installation

explore is available on PyPI. I recommend installing it with pipx:

pipx install explore-cli
export OPENAI_API_KEY=<your OpenAI API key>
explore <directory>

Alternatively, you can clone this repository and run the script with poetry:

poetry install
poetry build
export OPENAI_API_KEY=<your OpenAI API key>
poetry run explore <directory>

Usage

usage: explore [-h] [-l LLM] [-m MODEL] directory

Interactively explore a codebase with an LLM.

positional arguments:
  directory             The directory to index and explore.

options:
  -h, --help            show this help message and exit
  -l LLM, --llm LLM     The LLM backend, one of openai, ollama, or azure. Default: openai. If using Azure, make sure to
                        set the AZURE_OPENAI_ENDPOINT and OPENAI_API_VERSION environment variables.
  -m MODEL, --model MODEL
                        The LLM model to use. Default: gpt-4o-mini for openai, mistral-nemo:latest for ollama, or
                        gpt-4o for azure.

How it works

  1. The codebase is indexed into a local Chroma store. Each file is split into chunks using language-specific separators.
  2. Documents relevant to the query are collected using multiple retrieval strategies:
    • Primary retrieval is done through vector similarity using the indexed embeddings.
    • A multi-query retriever issues multiple variations of the query to increase the diversity and relevance of retrieved documents.
    • Additionally, a history-aware retriever reformulates the user query, considering the conversation history to better capture context.
  3. Retrieved documents are deduplicated, concatenated, and added as context to the LLM, which generates an answer to the user's question. Answers include specific references to the files and code pertinent to the query.

Using Azure OpenAI

explore can connect to an Azure OpenAI instance using Azure Active Directory authentication. First, set the relevant environment variables (you can find the values to use for these in the Azure portal):

export AZURE_OPENAI_ENDPOINT=https://your-resource-name.openai.azure.com # <- endpoint for the Azure OpenAI instance
export OPENAI_API_VERSION=2024-10-01-preview # <- API version for the deployment you want to use

Make sure you are authenticated via the Azure CLI:

az login

When you invoke explore, pass the Azure OpenAI deployment name as the --model argument and specify azure as the --llm:

explore --llm azure --model gpt-4o /some/directory

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

explore_cli-0.4.1.tar.gz (109.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

explore_cli-0.4.1-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file explore_cli-0.4.1.tar.gz.

File metadata

  • Download URL: explore_cli-0.4.1.tar.gz
  • Upload date:
  • Size: 109.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.13

File hashes

Hashes for explore_cli-0.4.1.tar.gz
Algorithm Hash digest
SHA256 6d2905e30b2d4525f04f621ca1ef3b2f997c598715c00943935f954070b84bb6
MD5 72da62d0a60f05f538080f1f2b70d460
BLAKE2b-256 5f2671921946a1bdd848ad225da92d896e93d191372b6ec8d7ba1a380fcab630

See more details on using hashes here.

File details

Details for the file explore_cli-0.4.1-py3-none-any.whl.

File metadata

File hashes

Hashes for explore_cli-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 105e06ecbbdd03878ad47d6e08868335a66f0ec38a605073087448f1e7d80845
MD5 64666e7994ce05fa0267b252bdc0554d
BLAKE2b-256 a3bee6c2e2fa627af99137858728efa38babeb37231b2a2015835f41aefbcc73

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page