DebGPT -- General Purpose Terminal LLM Tool with Some Debian-Specific Design
Project description
% DebGPT(1) | General Purpose Terminal LLM Tool with Some Debian-Specific Design % Copyright (C) 2024 Mo Zhou lumin@debian.org; GNU LGPL-3.0+ License.
NAME
DebGPT - General Purpose Terminal LLM Tool with Some Debian-Specific Design
"AI" = "Artificial Idiot"
SYNOPSIS
debgpt [arguments] [subcommand [subcommand-arguments]]
DESCRIPTION
DebGPT is a lightweight terminal tool designed for everyday use with Large Language Models (LLMs), aiming to explore their potential in aiding Debian/Linux development. The possible use cases include code generation, documentation writing, code editing, and more, far beyond the capabilities of traditional software.
To achieve that, DebGPT gathers relevant information from various sources like files, directories, and URLs, and compiles it into a prompt for the LLM. It also supports Retrieval-Augmented Generation (RAG) using language embedding models. DebGPT supports a range of LLM service providers, both commercial and self-hosted, including OpenAI, Anthropic, Google Gemini, Ollama, LlamaFile, vLLM, and ZMQ (DebGPT's built-in backend for self-containment).
QUICK START
First, install DebGPT
from PyPI or Git repository:
pip3 install debgpt
pip3 install git+https://salsa.debian.org/deeplearning-team/debgpt.git
The bare minimum "configuration" required to make debgpt
work is
export OPENAI_API_KEY="your-api-key"
. If anything else is needed,
use the TUI-based wizard to (re-)configure:
debgpt config
Or use debgpt genconfig
to generate a configuration template and place it at
$HOME/.debgpt/config.toml
. Both config
and genconfig
will inherit any
existing configurations.
Upon completion, you can start an interactive chat with the LLM:
debgpt
Enjoy the chat!
Hint: A collection of samples generated using DebGPT can be found at this repository.
FRONTENDS
The frontend is a client that communicates with an LLM inference backend. It is responsible for sending user input to the backend and receiving responses while maintaining a history of interactions.
Available frontend options (specified by the --frontend|-F
) are:
openai
,
anthropic
,
google
,
xai
,
llamafile
,
ollama
,
vllm
,
zmq
(DebGPT built-in),
dryrun
(for debugging or copy-pasting information).
Note: For non-self-hosted backends, review third-party user agreements and refrain from sending sensitive information.
TUTORIAL
The following examples are carefully ordered. You can start from the first example and gradually move to the next one.
1. Basic Usage: Chatting with LLM and CLI Behavior
When no arguments are given, debgpt
leads you into a general terminal
chatting client with LLM backends. Use debgpt -h
to see detailed options.
debgpt
During the interactive chatting mode, you may press /
and see a list of
available escaped commands that will not be seen as LLM prompt.
-
/save <path.txt>
: save the last LLM response to the specified file. -
/reset
: clear the context. So you can start a new conversation without quiting. -
/quit
: quit the chatting mode. You can pressCtrl-D
to quit as well.
The first user prompt can be provided through argument (--ask|-A|-a
):
debgpt -A "Who are you? And what can LLM do?"
By specifying the --quit|-Q
option, the program will quit after receiving
the first response from LLM. For instance, we can let it mimic fortune
with temperature 1.0 (--temperature|-T 1.0
) for higher randomness:
debgpt -T 1.0 -QA 'Greet with me, and tell me a joke.'
After each session, the chatting history will be saved in ~/.debgpt
as a
json file in a unique name. The command debgpt replay
can replay the last
session if you forgot what LLM has replied to you.
The program can write the last LLM response to a file through -o <file>
,
and read question from stdin
:
debgpt -Qa 'write a hello world in rakudo for me' -o hello.raku
debgpt -HQ stdin < question.txt | tee result.txt
After gettting familiarized with the fundamental usage and its CLI behavior,
we can directly move on to the most important feature of this tool, namely the
special prompt reader -- MapReduce
.
2. Context Readers for Additional Information
Context Reader is a function that reads the plain text contents from the
specified resource, and wrap them as a part of a prompt for the LLM. Note, the
context readers can be arbitrarily combined together or specified multiple
times through the unified argument --file|-f
.
It can read from a file, a directory, a URL, a Debian Policy section, a Debian Developer Reference section, a Debian BTS page, a Debian build status page (buildd), a Google search result, etc.
For example, we can ask LLM to explain the contents of a file, or mimick
the licensecheck
command:
# read a plain text file and ask a question
debgpt -Hf README.md -a 'very briefly teach me how to use this software.'
debgpt -Hf debgpt/policy.py -A 'explain this file' # --file|-f for small file
debgpt -Hf debgpt/frontend.py -A 'Briefly tell me an SPDX identifier of this file.'
# PDF file is supported as well
debgpt -Hf my-resume.pdf -a 'Does this person have any foss-related experience?'
It can also read from a directory, or a URL:
debgpt -Hf 'https://www.debian.org/vote/2022/vote_003' -A 'Please explain the differences among the above choices.'
The unified reader --file|-f
can also read from other sources with a special
syntax:
-f bts:<bug_number>
for Debian bug tracking system
debgpt -Hf bts:src:pytorch -A 'Please summarize the above information. Make a table to organize it.'
debgpt -Hf bts:1056388 -A 'Please summarize the above information.'
-f buildd:<package>
for Debian buildd status
debgpt -Hf buildd:glibc -A 'Please summarize the above information. Make a table to organize it.'
-f cmd:<command_line>
for piping other commands' stdout
debgpt -Hf cmd:'apt list --upgradable' -A 'Briefly summarize the upgradable packages. You can categorize these packages.'
debgpt -Hf cmd:'git diff --staged' -A 'Briefly describe the change as a git commit message.'
-f man:<man_page>
and-f tldr:<tldr_page>
for reading system manual pages
debgpt -Hf man:debhelper-compat-upgrade-checklist -A "what's the change between compat 13 and compat 14?"
debgpt -H -f tldr:curl -f cmd:'curl -h' -A "download https://localhost/bigfile.iso to /tmp/workspace, in silent mode"
-f policy:<section>
and-f devref:<section>
for reading Debian Policy and Developer Reference
debgpt -Hf policy:7.2 -A "what is the difference between Depends: and Pre-Depends: ?"
debgpt -Hf devref:5.5 -A 'Please summarize the above information.'
# when section is not specified, it will read the whole document. This may exceed the LLM context size limit.
debgpt -Hf policy: -A 'what is the latest changes in this policy?'
# more examples
debgpt -Hf pytorch/debian/control -f policy:7.4 -A "Explain what Conflicts+Replaces means in pytorch/debian/control based on the provided policy document"
debgpt -Hf pytorch/debian/rules -f policy:4.9.1 -A "Implement the support for the 'nocheck' tag based on the example provided in the policy document."
3. Inplace Editing of a File
The argument [--inplace|-i] is for in-place editing of a file. It is a
read-write reader that does the same as --file|-f
(read-only) does, but
the inplace one will write the LLM response back to the file. We expect the
user to use this feature for editing a file.
If specified, the edits (in UNIX diff format) will be printed to the screen.
The --inplace|-i
will mandate the --quit|-Q
behavior, and will turn
off markdown rendering.
The following example will ask LLM to edit the pyproject.toml
file, adding
pygments
to its dependencies. This really works correctly.
debgpt -Hi pyproject.toml -a 'edit this file, adding pygments to its dependencies.'
If working in a Git repository, we can make things more automated:
You may further append --inplace-git-add-commit
to automatically add and
commit the changes to the Git repository. If you want to review before commit,
specify --inplace-git-p-add-commit|-I
argument instead.
debgpt -Hi pyproject.toml -a 'edit this file, adding pygments to its dependencies.' --inplace-git-add-commit
The commit resulted by the above example can be seen at this link. Recent LLMs are strong enough to easily and correctly add type annotations and doc strings in DebGPT's python codebase, see example here.
4. Vector Retriever for Most Relevant Information
This is WIP. Leveraging the embeddings to retrieve. Basically RAG.
5. MapReduce for Any Length Context
The "MapReduce" feature is the choice if you want the LLM to read bulk documentations.
Generally, LLMs have a limited context length. If you want to ask a question regarding a very long context, you can split the context into multiple parts, and extract the relevant information from each part. Then, you can ask the LLM to answer the question based on the extracted information.
The implementation of this is fairly simple: split the gathered information texts until the pre-defined maximum chunk size is satisfied, ask the LLM to extract relevant information from each chunk, and then repeatedly merge the extracted information through LLM summarization, untill there is only one chunk left. As a result, this functionality can be very quota-consuming if you are going to deal with long texts. Please keep an eye on your bill when you try this on a paied API service.
This functionality is implemented as the --mapreduce|-x
argument. The user
has to specify the --ask|-A|-a
argument to tell LLM what kind of question we
want to ask so it can extract the right information. It will summarize if the
--ask|-A|-a
argument is missing.
The key difference between the MapReduce and Vector Retriever is that MapReduce will really make the language model read all information you passed to it, while vector retriever will only make language model read the most relevant several pieces of information stored in the database.
Some usage examples of MapReduce are as follows:
- Load a file and ask a question
debgpt -Hx resume.pdf -A 'Does this person know AI? To what extent?'
- Load a directory and ask a question
debgpt -Hx . -a 'which file implemented mapreduce? how does it work?'
debgpt -Hx . -a 'teach me how to use this software. Is there any hidden functionality that is not written in its readme?'
debgpt -Hx ./debian -A 'how is this package built? how many binary packages will be produced?'
- Load a URL and ask a question
debgpt -Hx 'https://www.debian.org/doc/debian-policy/policy.txt' -A 'what is the purpose of the archive?'
- Load the whole Debian Policy document (plain text) and ask a question
debgpt -Hx policy:all -a "what is the latest changes in this policy?"
debgpt -Hx policy:all -A 'what package should enter contrib instead of main or non-free?'
- Load the whole Debian Developer Reference document (plain text) and ask a question
debgpt -Hx devref:all -A 'How can I become a debian developer?'
debgpt -Hx devref:all -a 'how does general resolution work?'
- If you don't really bother to read
policy:
anddevref:
, or forgot which one is talking about the question in you mind, for instance:
debgpt -H -x policy:all -x devref:all -a 'which document (and which section) talk about Multi-Arch: ?'
- Summarize the mailing list discussions within a month (MapReduce is more suitable than the retrieval (RAG) for this purpose):
debgpt -Hx ldo:debian-project/2024/10 -a 'write a news report based on the provided information. Cover as many topics as possible. You may expand a little bit on important matter. You must include links for every topic to the report.' --no-render
- Load the latest sbuild log file and ask a question
debgpt -Hx sbuild: -A 'why does the build fail? do you have any suggestion?'
- Google search:
-x google:
will use your prompt as the search query, and answer your question after reading the search results
debgpt -Hx google: -a 'how to start python programming?'
- Google search:
-x google:<search_query>
gives more control over the search query. Here we let LLM answer the question provided by-a
based on the search results of "debian packaging".
debgpt -Hx google:'debian packaging' -a 'how to learn debian packaging?'
The -H
argument will skip printing the first prompt generated by debgpt
,
because it is typically very lengthy, and only useful for debugging and
development purpose. To further tweak the mapreduce behavior, you may want to
check the --mapreduce_chunksize <int>
and --mapreduce_parallelism <int>
arguments.
6. Piping through Everywhere
Being able to pipe the inputs and outputs among different programs is one of the reasons why I love the UNIX philosophy.
The pipe mode is useful when you want to use debgpt
in a shell script Try the
follows on the Makefile in debgpt repo. Later we will introduce a in-place
editing functionality which is more convenient than this one.
cat Makefile | debgpt -a 'delete the deprecated targets' pipe | tee tmp ; mv tmp Makefile; git diff
The pipe mode can be used for editing something in vim in-place.
# In vim debgpt/task.py, use 'V' mode to select the task_backend function, then
:'<,'>!debgpt -a 'add type annotations and comments to this function' pipe
This looks interesting, right? debgpt
has a git wrapper that automatically
generates the git commit message for the staged contents and commit the message.
Just try debgpt git commit --amend
to see how it works. This will also be
mentioned in the subcommands section.
7. DebGPT Subcommands
Git subcommand.
Let LLM automatically generate the git commit message, and call git to commit it:
debgpt git commit --amend
If you don't even want to git commit --amend
the commited message, just
remove --amend
from it.
8. Prompt Engineering
As you may have seen, the biggest variation in LLM usage happens in the context
including how you provide the context readers, and how you ask the question
through --ask|-A|-a
. By adjusting the way you provide those information
and ask the question, you can get significantly different results. To properly
make LLM work for you, you may need to go through some basic prompt engineering
methods.
The following are some references on this topic:
- OpenAI's Guide https://platform.openai.com/docs/guides/prompt-engineering
Advanced usage of LLM such as Chain-of-Thought (CoT) will not be covered in this document. Please refer external resources for more information.
The usage of LLM is limited by our imaginations. I am glad to hear from you if you have more good ideas on how we can make LLMs useful for Debian development: https://salsa.debian.org/deeplearning-team/debgpt/-/issues
TROUBLESHOOTING
- Context overlength: If the result from context readers (such as feeding
--file
with a huge text file) is too long, you can switch to the--mapreduce|-x
special reader, or switch to a model or service provider that supports longer context.
BACKEND
Available Backend Implementations
This tool provides one backend implementation: zmq
.
zmq
: Only needed when you choose the ZMQ front end for self-hosted LLM inference server.
If you plan to use the openai
or dryrun
frontends, there is no specific
hardware requirement. If you would like to self-host the LLM inference backend
(ZMQ backend), powerful hardware is required.
LLM Selections
The concrete hardware requirement depends on the LLM you would like to use. A variety of open-access LLMs can be found here
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard
Generally, when trying to do prompt engineering only, the "instruction-tuned" LLMs and "RL-tuned" (RL is reinforcement learning) LLMs are recommended.
The pretrained (raw) LLMs are not quite useful in this case, as they have not yet gone through instruction tuning, nor reinforcement learning tuning procedure. These pretrained LLMs will more likely generate garbage and not follow your instructions, or simply repeat your instruction. We will only revisit the pretrained LLMs when we plan to start collecting data and fine-tune (e.g., LoRA) a model in the far future.
The following is a list of supported LLMs for self-hosting (this list will be updated when there are new state-of-the-art open-access LLMs available):
-
Mistral7B (
Mistral-7B-Instruct-v0.2
) (default) : This model requires roughly 15GB of disks space to download. -
Mixtral8x7B (
Mixtral-8x7B-Instruct-v0.1
) : This model is larger yet more powerful than the default LLM. In exchange, it poses even higher hardware requirements. It takes roughly 60~100GB disk space (I forgot this number. Will check later).
Different LLMs will pose different hardware requirements. Please see the "Hardware Requirements" subsection below.
Hardware Requirements
By default, we recommend doing LLM inference in fp16
precision. If the VRAM
(such as CUDA memory) is limited, you may also switch to even lower preicisions
such as 8bit
and 4bit
. For pure CPU inference, we only support fp32
precision now.
Note, Multi-GPU inference is supported by the underlying transformers library. If you have multiple GPUs, this memory requirement is roughly divided by your number of GPUs.
Hardware requirements for the Mistral7B
LLM:
Mistral7B
+fp16
(cuda): 24GB+ VRAM preferred, but needs a 48GB GPU to run all the demos (some of them have a context as long as 8k). Example: Nvidia RTX A5000, Nvidia RTX 4090.Mistral7B
+8bit
(cuda): 12GB+ VRAM at minimum, but 24GB+ preferred so you can run all demos.Mistral7B
+4bit
(cuda): 6GB+ VRAM at minimum but 12GB+ preferred so you can run all demos. Example: Nvidia RTX 4070 (mobile) 8GB.Mistral7B
+fp32
(cpu): Requires 64GB+ of RAM, but a CPU is 100~400 times slower than a GPU for this workload and thus not recommended.
Hardware requirement for the Mixtral8x7B
LLM:
Mixtral8x7B
+fp16
(cuda): 90GB+ VRAM.Mixtral8x7B
+8bit
(cuda): 45GB+ VRAM.Mixtral8x7B
+4bit
(cuda): 23GB+ VRAM, but in order to make it work with long context such as 8k tokens, you still need 2x 48GB GPUs in 4bit precision.
See https://huggingface.co/blog/mixtral for more.
Usage of the ZMQ Backend
If you want to run the default LLM with different precisions:
debgpt backend --max_new_tokens=1024 --device cuda --precision fp16
debgpt backend --max_new_tokens=1024 --device cuda --precision bf16
debgpt backend --max_new_tokens=1024 --device cuda --precision 8bit
debgpt backend --max_new_tokens=1024 --device cuda --precision 4bit
The only supported precision on CPU is fp32 (for now). If you want to fall back to CPU computation (very slow):
debgpt backend --max_new_tokens=1024 --device cpu --precision fp32
If you want to run a different LLM, such as Mixtral8x7B
than the default Mistral7B
:
debgpt backend --max_new_tokens=1024 --device cuda --precision 4bit --llm Mixtral8x7B
The argument --max_new_tokens
does not matter much and you can adjust it (it
is the maximum length of each llm reply). You can adjust it as wish.
REFERENCES
[1] Access large language models from the command-line : https://github.com/simonw/llm
[2] Turn your task descriptions into precise shell commands : https://github.com/sderev/shellgenius
[3] the AI-native open-source embedding database : https://github.com/chroma-core/chroma
[4] LangChain: Build context-aware reasoning applications : https://python.langchain.com/docs/introduction/
[5] Ollama: Embedding Models : https://ollama.com/blog/embedding-models
[6] OpenAI: Embedding Models : https://platform.openai.com/docs/guides/embeddings
[7] Moonshot - A simple and modular tool to evaluate and red-team any LLM application. : https://github.com/aiverify-foundation/moonshot?tab=readme-ov-file
[8] LLMxMapReduce (Concurrent work. Their method is more advanced than mine) : https://arxiv.org/abs/2410.09342
LICENSE and ACKNOWLEDGEMENT
DebGPT development is helped with various open-access and commercial LLMs on code suggestion, code writing, code editing, document writing, with human reviews and modifications.
Copyright (C) 2024 Mo Zhou <lumin@debian.org>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file debgpt-0.7.1.tar.gz
.
File metadata
- Download URL: debgpt-0.7.1.tar.gz
- Upload date:
- Size: 82.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7bea3ce412d4aa27cd589001198014cd677791aa4fd9423d86ddbfb84dd8bdac |
|
MD5 | 60520b562c278267af1ed3001ea51a95 |
|
BLAKE2b-256 | 3629debc508e5a0bc73bf078e58fa199317761f6977d54284015fd5a42395a6d |
File details
Details for the file debgpt-0.7.1-py3-none-any.whl
.
File metadata
- Download URL: debgpt-0.7.1-py3-none-any.whl
- Upload date:
- Size: 80.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c1b2d5063bdc52b7bf0825d617344a72cc4c46195d2e314b05db5511967f10e |
|
MD5 | df525539b94b1d4324d26380f1781554 |
|
BLAKE2b-256 | 575e5a4b0dd6b76cf90182b95239d1f08a24ede6eea85d683d80750e1218a3a0 |