Skip to main content

Offline local-LLM terminal app for Jetson and edge Linux: chat with on-device models, run agent tools, and manage context safely.

Project description

open-jet

open-jet is an offline-first terminal app for running local LLM workflows on edge Linux devices (including Jetson-class hardware).

It provides:

  • local chat with your on-device model
  • safe file-context loading with token/memory guards
  • slash commands for session control
  • first-run setup for model/runtime configuration
  • optional session logging and resume

Requirements

  • llama-server from llama.cpp built for your device (see below)
  • a local .gguf model file, or ollama installed for model download

Building llama-server

open-jet uses llama-server from llama.cpp as its inference backend. Pre-built binaries are available for x86, but on Jetson/ARM64 you need to build from source with the right flags.

Jetson (Orin Nano, Orin NX, AGX Orin)

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
mkdir build && cd build

cmake .. \
  -DGGML_CUDA=ON \
  -DCMAKE_CUDA_ARCHITECTURES=87 \
  -DGGML_CUDA_FA_ALL_QUANTS=ON

cmake --build . --target llama-server -j$(nproc)

Key flags:

Flag Why
GGML_CUDA=ON Enable CUDA backend
CMAKE_CUDA_ARCHITECTURES=87 Target SM 8.7 (Orin). Use 72 for Xavier.
GGML_CUDA_FA_ALL_QUANTS=ON Enable flash attention for all KV cache quantizations (q8_0, q4_0), not just f16. Required for fast inference with quantized KV cache.

The built binary will be at build/bin/llama-server. Either add it to your PATH or leave it at ~/llama.cpp/build/bin/ where open-jet will find it automatically.

Other Linux (x86_64 with NVIDIA GPU)

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
mkdir build && cd build

cmake .. -DGGML_CUDA=ON -DGGML_CUDA_FA_ALL_QUANTS=ON
cmake --build . --target llama-server -j$(nproc)

CPU only

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
mkdir build && cd build

cmake ..
cmake --build . --target llama-server -j$(nproc)

Install

pip install open-jet

Start

open-jet

Optional setup screen on launch:

open-jet --setup

First-Run Setup

On first run, open-jet guides you through:

  1. hardware detection/profile
  2. model source selection
  3. model path or download choice
  4. context window size
  5. GPU offload configuration

It then saves your configuration and starts the runtime.

Basic Use

  • Type normally and press Enter to chat
  • Use @file or @[path with spaces] to add file content to context
  • Type / to open slash-command suggestions
  • Tab/Enter can autocomplete slash commands and file mentions
  • Ctrl+C or /exit quits

Slash Commands

  • /help show commands
  • /exit quit app
  • /clear clear chat and restart runtime (flush KV cache)
  • /clear-chat clear chat only
  • /status show context/RAM status
  • /condense condense older context
  • /load <path> load a file into context
  • /resume load previous saved session
  • /setup reopen setup wizard

Configuration

Main settings are stored in config.yaml, including:

  • context window size
  • memory guard limits
  • logging settings
  • session state/resume settings

TensorRT-LLM (PyTorch runtime) with Qwen

open-jet can run against trtllm-serve instead of llama-server.

  1. Install TensorRT-LLM so trtllm-serve is on your PATH.
  2. Set your config to use the TensorRT-LLM runtime.

Example config.yaml values:

runtime: trtllm_pytorch
model: Qwen/Qwen2.5-7B-Instruct
trtllm_backend: pytorch
trtllm_trust_remote_code: true
# optional: pass a trtllm-serve YAML file
# trtllm_config_path: /home/you/qwen-fast.yml
context_window_tokens: 4096
gpu_layers: 0

When runtime is trtllm_pytorch, open-jet launches:

trtllm-serve <model> --backend pytorch --host 127.0.0.1 --port 8080

and then connects through the same OpenAI-compatible chat API path.

Logging and Session State

When enabled:

  • session events are written to session_logs/*.events.jsonl
  • system metrics are written to session_logs/*.metrics.jsonl
  • conversation state is saved to session_state.json

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

open_jet-0.1.5-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl (6.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

File details

Details for the file open_jet-0.1.5-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl.

File metadata

File hashes

Hashes for open_jet-0.1.5-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 5cb5bcd6178561c3c6a410ce422e919c8fa752f58defe58161ff70f53b0a2f2e
MD5 66919d67cb07497243f51b68afbfbb9e
BLAKE2b-256 85909977755514f2ac4202c9a9dc81694bb8d9c92053675b73380281e625494f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page