Offline local-LLM terminal app for Jetson and edge Linux: chat with on-device models, run agent tools, and manage context safely.
Project description
open-jet
open-jet is an offline-first terminal app for running local LLM workflows on edge Linux devices (including Jetson-class hardware).
It provides:
- local chat with your on-device model
- safe file-context loading with token/memory guards
- slash commands for session control
- first-run setup for model/runtime configuration
- optional session logging and resume
Requirements
llama-serverfromllama.cppbuilt for your device (see below)- a local
.ggufmodel file, orollamainstalled for model download
Building llama-server
open-jet uses llama-server from llama.cpp as its inference backend. Pre-built binaries are available for x86, but on Jetson/ARM64 you need to build from source with the right flags.
Jetson (Orin Nano, Orin NX, AGX Orin)
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
mkdir build && cd build
cmake .. \
-DGGML_CUDA=ON \
-DCMAKE_CUDA_ARCHITECTURES=87 \
-DGGML_CUDA_FA_ALL_QUANTS=ON
cmake --build . --target llama-server -j$(nproc)
Key flags:
| Flag | Why |
|---|---|
GGML_CUDA=ON |
Enable CUDA backend |
CMAKE_CUDA_ARCHITECTURES=87 |
Target SM 8.7 (Orin). Use 72 for Xavier. |
GGML_CUDA_FA_ALL_QUANTS=ON |
Enable flash attention for all KV cache quantizations (q8_0, q4_0), not just f16. Required for fast inference with quantized KV cache. |
The built binary will be at build/bin/llama-server. Either add it to your PATH or leave it at ~/llama.cpp/build/bin/ where open-jet will find it automatically.
Other Linux (x86_64 with NVIDIA GPU)
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
mkdir build && cd build
cmake .. -DGGML_CUDA=ON -DGGML_CUDA_FA_ALL_QUANTS=ON
cmake --build . --target llama-server -j$(nproc)
CPU only
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
mkdir build && cd build
cmake ..
cmake --build . --target llama-server -j$(nproc)
Install
pip install open-jet
Start
open-jet
Optional setup screen on launch:
open-jet --setup
First-Run Setup
On first run, open-jet guides you through:
- hardware detection/profile
- model source selection
- model path or download choice
- context window size
- GPU offload configuration
It then saves your configuration and starts the runtime.
Basic Use
- Type normally and press Enter to chat
- Use
@fileor@[path with spaces]to add file content to context - Type
/to open slash-command suggestions Tab/Entercan autocomplete slash commands and file mentionsCtrl+Cor/exitquits
Slash Commands
/helpshow commands/exitquit app/clearclear chat and restart runtime (flush KV cache)/clear-chatclear chat only/statusshow context/RAM status/condensecondense older context/load <path>load a file into context/resumeload previous saved session/setupreopen setup wizard
Configuration
Main settings are stored in config.yaml, including:
- context window size
- memory guard limits
- logging settings
- session state/resume settings
TensorRT-LLM (PyTorch runtime) with Qwen
open-jet can run against trtllm-serve instead of llama-server.
- Install TensorRT-LLM so
trtllm-serveis on yourPATH. - Set your config to use the TensorRT-LLM runtime.
Example config.yaml values:
runtime: trtllm_pytorch
model: Qwen/Qwen2.5-7B-Instruct
trtllm_backend: pytorch
trtllm_trust_remote_code: true
# optional: pass a trtllm-serve YAML file
# trtllm_config_path: /home/you/qwen-fast.yml
context_window_tokens: 4096
gpu_layers: 0
When runtime is trtllm_pytorch, open-jet launches:
trtllm-serve <model> --backend pytorch --host 127.0.0.1 --port 8080
and then connects through the same OpenAI-compatible chat API path.
Logging and Session State
When enabled:
- session events are written to
session_logs/*.events.jsonl - system metrics are written to
session_logs/*.metrics.jsonl - conversation state is saved to
session_state.json
Contact
- Website: https://www.openjet.dev/
- X: https://x.com/flouislf
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file open_jet-0.1.5-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl.
File metadata
- Download URL: open_jet-0.1.5-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl
- Upload date:
- Size: 6.1 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cb5bcd6178561c3c6a410ce422e919c8fa752f58defe58161ff70f53b0a2f2e
|
|
| MD5 |
66919d67cb07497243f51b68afbfbb9e
|
|
| BLAKE2b-256 |
85909977755514f2ac4202c9a9dc81694bb8d9c92053675b73380281e625494f
|