Local-first agentic inference engine with tier-based model routing

These details have not been verified by PyPI

Project description

Entropic

Local-first agentic inference engine with tier-based model routing

This started as "I want to build a local-first Claude Code" — which turned out to be quite the undertaking. The initial build was a tightly coupled TUI, but it became clear pretty quickly that I was duplicating the same core inference engine across other local projects wrapping llama-cpp-python. So it evolved into a library: the inference engine, model orchestration, agentic loop, and tool framework are all importable and reusable without dragging in a UI. The TUI ships alongside it as one consumer, and doubles as a testbed for new ideas. There's also a very broken voice interface via PersonaPlex that I'll get to eventually.

The name is a nod to how this actually works. Every handoff — human intent to prompt, prompt to model, model to model across tiers — is a lossy translation. Information decays at each boundary. That's the entropic process this engine tries to manage: structured routing, context management, and tool-augmented reasoning to lose as little as possible along the way. A bit of a nihilistic naming convention, but the tier routing and model management do earn their keep in practice. There's optimization work ahead, but the foundation is solid and I'm always open to new directions.

Architecture

Entropic is a library first, application second. The inference engine (orchestrator, agentic loop, adapters, tool providers) is fully separable from any UI. The bundled TUI is one consumer; headless automation, CI/CD agents, and custom applications are equally supported.

pip install entropic-engine          # Core library (inference, engine, tools)
pip install entropic-engine[app]     # TUI application (includes tui + storage deps)
pip install entropic-engine[voice]   # Voice interface (PersonaPlex)
pip install entropic-engine[all]     # Everything

+-----------------------------------------------------+
|  Application Layer (TUI / Headless / Custom)        |
+-----------------------------------------------------+
|  Engine          |  Orchestrator    |  Tools         |
|  - Agentic loop  |  - Tier routing  |  - Filesystem  |
|  - Directives    |  - Model swap    |  - Bash        |
|  - Compaction    |  - VRAM mgmt     |  - Diagnostics |
|  - Context mgmt  |  - Adapters      |  - Git / Todo  |
+-----------------------------------------------------+
|  Inference Backend (llama-cpp-python)                |
|  - GGUF models, single-GPU, in-process              |
+-----------------------------------------------------+

Tier-Based Routing

A lightweight router model classifies each prompt and routes to the appropriate tier. Only one main model is loaded at a time (VRAM constraint) — the orchestrator handles dynamic swapping with lock-protected state transitions.

Tier	Purpose	Typical Model
Thinking	Complex reasoning, architecture, multi-step analysis	Qwen3-14B Q4_K_M
Normal	General conversation and tasks	Falcon-H1R-7B Q8_0
Code	Code generation, editing, refactoring	Falcon-H1R-7B Q8_0
Simple	Greetings, acknowledgments, short responses	(shares normal model)
Router	Prompt classification only	Qwen3-0.6B Q8_0

Agentic Loop

The engine runs an autonomous tool-calling loop: generate -> parse tool calls -> execute tools -> feed results back -> generate again. The loop continues until the model produces a complete response or hits the iteration limit. Tiers can auto-chain — when a tier exhausts its token budget without acting, the engine hands off to the next tier via configurable handoff rules.

Tools communicate back to the engine via directives — structured signals embedded in tool results that can trigger tier handoffs, context anchoring, and state management without the model needing to orchestrate these concerns.

Features

Fully Local — All inference on your hardware via llama-cpp-python. No API keys.
Library API — Embed the engine in your own application with LibraryConfig
Intelligent Routing — Sub-second prompt classification routes to the right model tier
Auto-Chain — Automatic tier handoff on token exhaustion or grammar completion
GBNF Grammar — Per-tier output constraints via GBNF grammars (streaming and non-streaming)
Single-GPU Orchestration — Dynamic model swapping with VRAM-aware loading
VRAM Lifecycle — Three-state model lifecycle (COLD→WARM→ACTIVE): warm models pin to CPU RAM via mlock, activate to GPU on demand — no reload from disk on tier swap
Per-Model Adapters — Model-specific chat templates, tool parsing, thinking block handling
Auto-Compaction — Context summarization for long conversations
MCP Tools — Filesystem, bash, diagnostics, git, and extensible tool servers
Runtime MCP — Register and unregister MCP servers at runtime via connect_server() / disconnect_server(); .mcp.json auto-discovered at startup
Benchmark CLI — Layer 1 benchmarks (load time, tok/s, VRAM, tier swap latency) via entropic benchmark run
Headless Mode — Full engine without TUI for automation and testing
TUI — Terminal interface built on Textual with streaming, tool approval, voice input

Requirements

Linux (tested on Ubuntu 22.04, 24.04)
NVIDIA GPU with 16GB+ VRAM
CUDA 12.4+
Python 3.10+

Installation

From source (recommended for GPU users)

git clone https://github.com/tvanfossen/entropic.git
cd entropic
./install.sh          # auto-detects GPU, builds CUDA support

The install script creates a virtual environment, clones and builds llama-cpp-python with CUDA support (if a GPU is detected), and installs entropic with the [app] extras.

# Place GGUF models in ~/models/gguf/ (or configure paths in .entropic/config.local.yaml)

# Run interactive TUI
.venv/bin/entropic

# Or headless
.venv/bin/entropic --headless

From PyPI

pip install entropic-engine
entropic setup-cuda   # build llama-cpp-python with CUDA + latest model support

What `setup-cuda` does

Clones llama-cpp-python v0.3.25 (JamePeng fork — upstream is abandoned)
Includes llama.cpp with Qwen3.5-MoE and other recent architectures
Builds with CUDA support (requires nvidia-smi, cmake, CUDA toolkit)
Installs into the current Python environment
Cached at ~/.entropic/.build/ — re-run is fast, use --force to rebuild

CPU-only (no GPU)

pip install entropic-engine

Models will run on CPU. Significantly slower but functional.

CLI

entropic                    # Interactive TUI
entropic --headless         # Headless mode (automation/testing)
entropic status             # Show model and system status
entropic ask "question"     # Single-shot question
entropic init               # Initialize .entropic/ in current directory
entropic download <model>   # Download model files
entropic setup-cuda         # Build llama-cpp-python with CUDA
entropic mcp-bridge         # Stdio→socket bridge for Claude Code integration
entropic benchmark run <model.gguf> --layer1-only   # Raw inference benchmarks

Configuration

Configuration loads in priority order (highest wins):

Built-in defaults
Global config (~/.entropic/config.yaml)
Project config (.entropic/config.local.yaml)
CLI arguments

Project context is provided via .entropic/ENTROPIC.md — a markdown file describing the project that gets included in the system prompt.

Library Usage

See examples/ for complete integrations (hello-world/, pychess/).

Privacy

Entropic runs entirely on your local hardware. No data is sent to external servers. No telemetry is collected. Your prompts, conversations, and model outputs never leave your machine.

Disclaimer

Entropic runs AI models locally on your hardware. AI-generated outputs may be inaccurate, biased, or inappropriate. Users are solely responsible for evaluating and using any generated content. This software does not provide professional, legal, medical, or financial advice.

License

Apache-2.0

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.3.8

May 22, 2026

2.3.7

May 21, 2026

2.3.6

May 20, 2026

2.3.5

May 20, 2026

2.2.9

May 19, 2026

2.2.4

May 18, 2026

2.2.2

May 18, 2026

2.2.1

May 15, 2026

2.2.0

May 14, 2026

2.1.7

May 13, 2026

2.1.6

May 13, 2026

2.1.5

May 12, 2026

2.1.4

May 4, 2026

2.1.3

Apr 29, 2026

2.1.1

Apr 29, 2026

2.1.0

Apr 28, 2026

This version

1.7.1

Mar 17, 2026

1.7.0

Mar 16, 2026

1.0.0

Feb 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

entropic_engine-1.7.1.tar.gz (281.1 kB view details)

Uploaded Mar 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

entropic_engine-1.7.1-py3-none-any.whl (349.2 kB view details)

Uploaded Mar 17, 2026 Python 3

File details

Details for the file entropic_engine-1.7.1.tar.gz.

File metadata

Download URL: entropic_engine-1.7.1.tar.gz
Upload date: Mar 17, 2026
Size: 281.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for entropic_engine-1.7.1.tar.gz
Algorithm	Hash digest
SHA256	`dbe8fd91911a06dc29c7a6f2e98bd54b1998e5f5f52b71b96f1d09e35e6362c3`
MD5	`9182c140de8dea31c49edd81cb07fe73`
BLAKE2b-256	`011a507712a2423b666b2cd4d95908431d316a4657a5a5968876a68a3617f519`

See more details on using hashes here.

Provenance

The following attestation bundles were made for entropic_engine-1.7.1.tar.gz:

Publisher: release.yaml on tvanfossen/entropic

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: entropic_engine-1.7.1.tar.gz
- Subject digest: dbe8fd91911a06dc29c7a6f2e98bd54b1998e5f5f52b71b96f1d09e35e6362c3
- Sigstore transparency entry: 1115425744
- Sigstore integration time: Mar 17, 2026
Source repository:
- Permalink: tvanfossen/entropic@cf53f737249faaf26909705689a8b6285ba524a9
- Branch / Tag: refs/tags/v1.7.1
- Owner: https://github.com/tvanfossen
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@cf53f737249faaf26909705689a8b6285ba524a9
- Trigger Event: push

File details

Details for the file entropic_engine-1.7.1-py3-none-any.whl.

File metadata

Download URL: entropic_engine-1.7.1-py3-none-any.whl
Upload date: Mar 17, 2026
Size: 349.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for entropic_engine-1.7.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1c77f8a35f8912121995428a122219b803f4c9c7ff1570f106b89c09c835ea2b`
MD5	`8e3a05ccf939575799e3518905996465`
BLAKE2b-256	`0994e9959c483240ba80a823d595075329e3769bcd08c79de66d71c47ac117fc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for entropic_engine-1.7.1-py3-none-any.whl:

Publisher: release.yaml on tvanfossen/entropic

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: entropic_engine-1.7.1-py3-none-any.whl
- Subject digest: 1c77f8a35f8912121995428a122219b803f4c9c7ff1570f106b89c09c835ea2b
- Sigstore transparency entry: 1115425763
- Sigstore integration time: Mar 17, 2026
Source repository:
- Permalink: tvanfossen/entropic@cf53f737249faaf26909705689a8b6285ba524a9
- Branch / Tag: refs/tags/v1.7.1
- Owner: https://github.com/tvanfossen
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@cf53f737249faaf26909705689a8b6285ba524a9
- Trigger Event: push

entropic-engine 1.7.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Entropic

Architecture

Tier-Based Routing

Agentic Loop

Features

Requirements

Installation

From source (recommended for GPU users)

From PyPI

What setup-cuda does

CPU-only (no GPU)

CLI

Configuration

Library Usage

Privacy

Disclaimer

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

What `setup-cuda` does