Chat with your current directory's files using a local or API LLM.
Project description
dir-assistant
Chat with your current directory's files using a local or API LLM.
Dir-assistant has local platform support for CPU (OpenBLAS), Cuda, ROCm, Metal, Vulkan, and SYCL.
Dir-assistant has API support for all major LLM APIs. More info in the LiteLLM Docs.
Dir-assistant uses a unique method for finding the most important files to include when submitting your prompt to an LLM called CGRAG (Contextually Guided Retrieval-Augmented Generation). You can read this blog post for more information about how it works.
This project runs local LLMs via the fantastic llama-cpp-python package and runs API LLMS using the also fantastic LiteLLM package.
New Features
- Now installable via pip
- Thorough CLI functionality including platform installation, model downloading, and config editing.
- User files have been moved to appropriate home hidden directories.
- Config now has llama.cpp completion options exposed (top_k, frequency_penalty, etc.)
Quickstart
In this section are recipes to run dir-assistant
in basic capacity to get you started quickly.
Quickstart with Local Default Model (Phi 3 128k)
To get started locally, you can download a default llm model. Default configuration with this model requires 14GB of memory, but you will be able to adjust the configuration to fit lower memory requirements. To run via CPU:
pip install dir-assistant
dir-assistant models download-embed
dir-assistant models download-llm
cd directory/to/chat/with
dir-assistant
To run with hardware acceleration, use the platform
subcommand:
...
dir-assistant platform cuda
cd directory/to/chat/with
dir-assistant
See which platforms are supported using -h
:
dir-assistant platform -h
Quickstart with API Model
To get started using an API model, you can use Google Gemini 1.5 Flash, which is currently free. To begin, you need to sign up for Google AI Studio and create an API key. After you create your API key, enter the following commands:
pip install dir-assistant
dir-assistant models download-embed
dir-assistant setkey GEMINI_API_KEY xxxxxYOURAPIKEYHERExxxxx
cd directory/to/chat/with
dir-assistant
You can optionally hardware-accelerate your local embedding model so indexing is quicker:
...
dir-assistant platform cuda
cd directory/to/chat/with
dir-assistant
See which platforms are supported using -h
:
dir-assistant platform -h
Install
Install with pip:
pip install dir-assistant
The default configuration for dir-assistant
is API-mode. If you download an LLM model with download-llm
,
local-mode will automatically be set. To change from API-mode to local-mode, set the ACTIVE_MODEL_IS_LOCAL
setting.
Embedding Model Download
You must download an embedding model regardless of whether you are running in local or API mode. You can download a good default embedding model with:
dir-assistant models download-embed
If you would like to use another embedding model, open the models directory with:
dir-assistant models
Note: The embedding model will be hardware accelerated after using the platform
subcommand. To disable
hardware acceleration, change n_gpu_layers = -1
to n_gpu_layers = 0
in the config.
Optional: Select A Hardware Platform
By default dir-assistant
is installed with CPU-only compute support. It will work properly without this step,
but if you would like to hardware accelerate dir-assistant
, use the command below to compile
llama-cpp-python
with your hardware's support.
dir-assistant platform cuda
Available options: cpu
, cuda
, rocm
, metal
, vulkan
, sycl
Note: The embedding model and the local llm model will be run with acceleration after selecting a platform. To disable
hardware acceleration change n_gpu_layers = -1
to n_gpu_layers = 0
in the config.
For Platform Install Issues
System dependencies may be required for the platform
command and are outside the scope of these instructions.
If you have any issues building llama-cpp-python
, the project's install instructions may offer more
info: https://github.com/abetlen/llama-cpp-python
API Configuration
If you wish to use an API LLM, you will need to configure it. To configure which LLM API
dir-assistant uses, you must edit LITELLM_MODEL
and the appropriate API key in your configuration. To open
your configuration file, enter:
dir-assistant config open
Once editing the file, change:
[DIR_ASSISTANT]
LITELLM_MODEL = "gemini/gemini-1.5-flash-latest"
LITELLM_CONTEXT_SIZE = 500000
...
[DIR_ASSISTANT.LITELLM_API_KEYS]
GEMINI_API_KEY = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
LiteLLM supports all major LLM APIs, including APIs hosted locally. View the available options in the LiteLLM providers list.
There is a convenience subcommand for modifying and adding API keys:
dir-assistant setkey GEMINI_API_KEY xxxxxYOURAPIKEYHERExxxxx
However, in most cases you will need to modify other options when changing APIs.
Local LLM Model Download
If you want to use a local LLM, you can download a low requirements default model (Phi 3 128k) with:
dir-assistant models download-llm
Note: The local LLM model will be hardware accelerated after using the platform
subcommand. To disable hardware
acceleration, change n_gpu_layers = -1
to n_gpu_layers = 0
in the config.
Configuring A Custom Local Model
If you would like to use a custom local LLM model, download a GGUF model and place it in your models directory. Huggingface has numerous GGUF models to choose from. The models directory can be opened in a file browser using this command:
dir-assistant models
After putting your gguf in the models directory, you must configure dir-assistant to use it:
dir-assistant config open
Edit the following setting:
[DIR_ASSISTANT]
LLM_MODEL = "Mistral-Nemo-Instruct-2407.Q6_K.gguf"
Llama.cpp Options
Llama.cpp provides a large number of options to customize how your local model is run. Most of these options are
exposed via llama-cpp-python
. You can configure them with the [DIR_ASSISTANT.LLAMA_CPP_OPTIONS]
,
[DIR_ASSISTANT.LLAMA_CPP_EMBED_OPTIONS]
, and [DIR_ASSISTANT.LLAMA_CPP_COMPLETION_OPTIONS]
sections in the
config file.
The options available for llama-cpp-python
are documented in the
Llama constructor documentation.
What the options do is also documented in the llama.cpp CLI documentation.
The most important llama-cpp-python
options are related to tuning the LLM to your system's VRAM:
- Setting
n_ctx
lower will reduce the amount of VRAM required to run, but will decrease the amount of file text that can be included when running a prompt. CONTEXT_FILE_RATIO
sets the proportion of prompt history to file text to be included when sent to the LLM. Higher ratios mean more file text and less prompt history. More file text generally improves comprehension.- If your llm
n_ctx
is smaller than your embedn_ctx
timesCONTEXT_FILE_RATIO
, your file text chunks have the potential to be larger than your llm context, and thus will not be included. To ensure all files can be included, make sure your embed context is smaller thann_ctx
timesCONTEXT_FILE_RATIO
. - Larger embed
n_ctx
will chunk your files into larger sizes, which allows LLMs to understand them more easily. n_batch
must be smaller than then_ctx
of a model, but setting it higher will probably improve performance.
For other tips about tuning Llama.cpp, explore their documentation and do some google searches.
Running
dir-assistant
Running dir-assistant
will scan all files recursively in your current directory. The most relevant files will
automatically be sent to the LLM when you enter a prompt.
Upgrading
Some version upgrades may have incompatibility issues in the embedding index cache. Use this command to delete the index cache so it may be regenerated:
dir-assistant clear
Additional Help
Use the -h
argument with any command or subcommand to view more information. If your problem is beyond the scope of
the helptext, please report a github issue.
Contributors
We appreciate contributions from the community! For a list of contributors and how you can contribute, please see CONTRIBUTORS.md.
Limitations
- Only tested on Ubuntu 22.04. Please let us know if you run it successfully on other platforms by submitting an issue.
- Dir-assistant only detects and reads text files at this time.
Todos
API LLMsRAGFile caching (improve startup time)CGRAG (Contextually-Guided Retrieval-Augmented Generation)Multi-line inputFile watching (automatically reindex changed files)Single-step pip installModel download- Web search
- API Embedding models
- Simple mode for better compatibility with external script automations
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dir_assistant-1.0.2.tar.gz
.
File metadata
- Download URL: dir_assistant-1.0.2.tar.gz
- Upload date:
- Size: 19.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d18b88e52b0926e08b9cd07dacd0bfdccdab990b569462599666b3411f21f0ee |
|
MD5 | 4549e9d2e524157db0d6ab8109b5bb29 |
|
BLAKE2b-256 | b88ac9bbd3646e4be450991b1479856695ab1d2a301bbe1a9f7694b3551044c0 |
File details
Details for the file dir_assistant-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: dir_assistant-1.0.2-py3-none-any.whl
- Upload date:
- Size: 18.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2053aaec0631952a283a7825c123d7276ff254751a0e7bf66cf83886de89bc6 |
|
MD5 | 553ba91d904956dbf0b800da89950a06 |
|
BLAKE2b-256 | 45b93a9d81deac468b10f6cb4c5b4a29bdb231415965918c7b88e660ae9fbdfa |