Offline local-LLM terminal app for Jetson and edge Linux: chat with on-device models, run agent tools, and manage context safely.

These details have not been verified by PyPI

Project links

Project description

open-jet

open-jet is an offline-first agent runtime for edge Linux devices with tight memory budgets.

It is built for Jetson-class and other edge systems where the hard part is not just running a local model, but keeping the agent useful under constrained RAM, limited context windows, interrupted sessions, and hardware-specific failure modes.

It provides:

bounded-context local chat with your on-device model
safe file and doc loading with token and memory guards
automatic context condensing under pressure
session resume and harness state recovery
replayable JSONL event traces for evals and debugging
hardware-aware runtime setup for Jetson and edge Linux
slash commands and harness modes for controlled workflows
a Python SDK for driving the same agent backend without the TUI

open-jet is positioned around five practical problems:

managing limited prompt memory on-device
resuming interrupted work instead of starting over
enforcing deterministic tool and approval boundaries
capturing real traces for evaluation instead of guessing from vibes
turning constrained local models into reliable operator workflows

Requirements

llama-server from llama.cpp built for your device (see below)
a local .gguf model file, or ollama installed for model download

Building llama-server

open-jet uses llama-server from llama.cpp as its inference backend. Pre-built binaries are available for x86, but on Jetson/ARM64 you need to build from source with the right flags.

Jetson (Orin Nano, Orin NX, AGX Orin)

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
mkdir build && cd build

cmake .. \
  -DGGML_CUDA=ON \
  -DCMAKE_CUDA_ARCHITECTURES=87 \
  -DGGML_CUDA_FA_ALL_QUANTS=ON

cmake --build . --target llama-server -j$(nproc)

Key flags:

Flag	Why
`GGML_CUDA=ON`	Enable CUDA backend
`CMAKE_CUDA_ARCHITECTURES=87`	Target SM 8.7 (Orin). Use `72` for Xavier.
`GGML_CUDA_FA_ALL_QUANTS=ON`	Enable flash attention for all KV cache quantizations (q8_0, q4_0), not just f16. Required for fast inference with quantized KV cache.

The built binary will be at build/bin/llama-server. Either add it to your PATH or leave it at ~/llama.cpp/build/bin/ where open-jet will find it automatically.

Other Linux (x86_64 with NVIDIA GPU)

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
mkdir build && cd build

cmake .. -DGGML_CUDA=ON -DGGML_CUDA_FA_ALL_QUANTS=ON
cmake --build . --target llama-server -j$(nproc)

CPU only

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
mkdir build && cd build

cmake ..
cmake --build . --target llama-server -j$(nproc)

Install

pip install open-jet

Start

open-jet

Optional setup screen on launch:

open-jet --setup

License

open-jet is licensed under AGPL-3.0-only.

Commercial licensing is also available for organizations that want to use open-jet under different terms.

See LICENSE.

Why It Exists

Most local LLM tools stop at "chat with a model on your box." That breaks down quickly on edge hardware:

context windows are small relative to the task
available RAM moves around under real workloads
long tasks get interrupted
shell and file actions need deterministic approval paths
failures differ by runtime, model, quant, and device profile

open-jet is designed around those constraints. The goal is to keep an on-device agent productive when memory is bounded and recovery matters more than demos.

Python SDK

open-jet also exposes a programmatic session API so you can drive the same bounded-memory agent backend from your own scripts.

import asyncio

from src import OpenJetSession


async def main() -> None:
    session = await OpenJetSession.create()
    try:
        response = await session.run("Summarize the current README")
        print(response.text)

        vision = await session.run("Describe this image", image_paths=["./example.png"])
        print(vision.text)

        async for event in session.stream("Inspect README.md with tools if needed"):
            if event.text:
                print(event.text, end="")
            if event.tool_result:
                print(f"\n[{event.tool_result.tool_call.name}] {event.tool_result.output}")
    finally:
        await session.close()


asyncio.run(main())

Tools that mutate state or run shell commands require an approval handler:

session = await OpenJetSession.create(
    approval_handler=lambda tool_call: tool_call.name == "shell"
)

You can also restrict the tool surface for embedded use:

session = await OpenJetSession.create(allowed_tools={"read_file", "load_file", "grep"})

First-Run Setup

On first run, open-jet guides you through:

hardware detection/profile
model source selection
model path or download choice
context window size
GPU offload configuration

It then saves your configuration and starts the runtime with a device-appropriate memory profile.

Basic Use

Type normally and press Enter to chat
Use @file or @[path with spaces] to add file content to context
Use @image.png or paste local image file paths into the prompt to attach images to the next turn
Type / to open slash-command suggestions
Tab/Enter can autocomplete slash commands and file mentions
Ctrl+C or /exit quits

The app is designed to keep work decomposed into small recoverable turns instead of trying to hold an entire task in prompt memory at once.

Slash Commands

/help show commands
/exit quit app
/clear clear chat and restart runtime (flush KV cache)
/clear-chat clear chat only
/status show context/RAM status
/condense condense older context
/load <path> load a file into context
/resume load previous saved session
/setup reopen setup wizard

Workflow Harness

open-jet includes a lightweight harness layer for keeping agent work structured under constrained context:

modes for chat, code, review, and debug
step-oriented state so the agent can continue work across turns
skill docs and project docs loaded into bounded turn context
persistent harness state stored under .openjet/

This is there to reduce prompt drift and keep limited-context models on the current task.

Configuration

Main settings are stored in config.yaml, including:

context window size
memory guard limits
logging settings
session state/resume settings

SGLang

open-jet can also connect to SGLang through its OpenAI-compatible server.

On Jetson, prefer running SGLang in a local container. This keeps inference fully local while avoiding host-side Python dependency issues like missing triton.

Example config.yaml values:

runtime: sglang
model: /home/you/models/Qwen3.5-4B-AWQ-4bit
sglang_model: /home/you/models/Qwen3.5-4B-AWQ-4bit
sglang_launch_mode: docker
sglang_base_url: http://127.0.0.1:30000
sglang_docker_image: your-local-sglang-image
sglang_docker_container_name: open-jet-sglang
sglang_docker_runtime: nvidia
sglang_served_model_name: local
sglang_reasoning_parser: qwen3
sglang_tool_call_parser: qwen3_coder
sglang_mem_fraction_static: 0.8
context_window_tokens: 8192
gpu_layers: 0

In docker mode, open-jet starts the local container itself and waits for the OpenAI-compatible API on 127.0.0.1.

In external mode, open-jet does not import or launch SGLang from the host environment. It only connects to an already-running local server:

http://127.0.0.1:30000

Use managed mode only when SGLang is installed in the same Python environment as open-jet.

TensorRT-LLM (PyTorch runtime) with Qwen

open-jet can run against trtllm-serve instead of llama-server.

Install TensorRT-LLM so trtllm-serve is on your PATH.
Set your config to use the TensorRT-LLM runtime.

Example config.yaml values:

runtime: trtllm_pytorch
model: Qwen/Qwen2.5-7B-Instruct
trtllm_backend: pytorch
trtllm_trust_remote_code: true
# optional: pass a trtllm-serve YAML file
# trtllm_config_path: /home/you/qwen-fast.yml
context_window_tokens: 4096
gpu_layers: 0

When runtime is trtllm_pytorch, open-jet launches:

trtllm-serve <model> --backend pytorch --host 127.0.0.1 --port 8080

and then connects through the same OpenAI-compatible chat API path.

Logging and Session State

When enabled:

session events are written to session_logs/*.events.jsonl
system metrics are written to session_logs/*.metrics.jsonl
conversation state is saved to session_state.json

The event log is the main reliability artifact. It captures replayable traces for things like:

tool call success rate
approval and denial decisions
interrupted generation and resumed sessions
time-to-resolution
token usage for successful tasks
hallucinated or low-value command proposals
hardware and runtime-specific failure analysis

This is meant to support evaluation from real traces, not just subjective testing.

Contact

Website: https://www.openjet.dev/
X: https://x.com/flouislf

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.1

Apr 16, 2026

0.3.13

Apr 9, 2026

0.3.12

Apr 6, 2026

0.3.11

Apr 6, 2026

0.3.10

Apr 4, 2026

0.3.9

Apr 4, 2026

0.3.8

Apr 4, 2026

0.3.7

Apr 4, 2026

0.3.6

Mar 31, 2026

0.3.5

Mar 31, 2026

0.3.3

Mar 31, 2026

0.1.16

Mar 20, 2026

0.1.14

Mar 17, 2026

0.1.13

Mar 17, 2026

0.1.12

Mar 15, 2026

This version

0.1.10

Mar 13, 2026

0.1.8

Mar 13, 2026

0.1.5

Feb 27, 2026

0.1.4

Feb 26, 2026

0.1.3

Feb 25, 2026

0.1.2

Feb 25, 2026

0.1.0

Feb 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

open_jet-0.1.10-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl (10.0 MB view details)

Uploaded Mar 13, 2026 CPython 3.10manylinux: glibc 2.17+ ARM64

File details

Details for the file open_jet-0.1.10-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl.

File metadata

Download URL: open_jet-0.1.10-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl
Upload date: Mar 13, 2026
Size: 10.0 MB
Tags: CPython 3.10, manylinux: glibc 2.17+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for open_jet-0.1.10-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.whl
Algorithm	Hash digest
SHA256	`a71d0ecf61b351c601bb695c580cd92a3409d94b58fa075219d3305e3b68650a`
MD5	`c5b753edcf08aba3496c3922d7bf918e`
BLAKE2b-256	`a422d23023e30734d284f0dd3e4fe5a3d84b96426646fdf58a0278402ddb657c`

See more details on using hashes here.

open-jet 0.1.10

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

open-jet

Requirements

Building llama-server

Jetson (Orin Nano, Orin NX, AGX Orin)

Other Linux (x86_64 with NVIDIA GPU)

CPU only

Install

Start

License

Why It Exists

Python SDK

First-Run Setup

Basic Use

Slash Commands

Workflow Harness

Configuration

SGLang

TensorRT-LLM (PyTorch runtime) with Qwen

Logging and Session State

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes