T-LEX: Edge-Decoupled LLM Inference - Run 32B models on ANY device!

These details have not been verified by PyPI

Project links

Project description

T-LEX Edge

Edge-Decoupled LLM Inference - Run 32B+ models on ANY device!

The Problem

Running large language models locally requires expensive GPU hardware:

Model	VRAM Required	Cost
7B	~14GB	$500+
32B	~24GB	$1000+
70B	~40GB	$2000+

Most edge devices (laptops, IoT, drones, phones) can't run these models.

The Solution: Split-Brain Architecture

T-LEX separates inference (GPU server) from decoding (edge device):

┌─────────────────────────┐         ┌─────────────────────────┐
│     GPU Server          │         │     Edge Device         │
│  ┌─────────────────┐    │ tokens  │  ┌─────────────────┐    │
│  │ Ollama/OomLlama │────┼────────►│  │ T-LEX Decoder   │    │
│  │ 32B model       │    │         │  │ vocab.db (6MB)  │    │
│  │ 24GB VRAM       │    │         │  │ NO GPU!         │    │
│  └─────────────────┘    │         │  └─────────────────┘    │
└─────────────────────────┘         └─────────────────────────┘

Performance

Metric	Local 32B	T-LEX (Remote 32B + Edge Decode)
GPU Required (Edge)	24GB VRAM	None!
Edge Storage	20GB+	6 MB
Generation Speed	2 tok/s	16 tok/s
Decode Speed	N/A	45,000 tok/s

Result: 8x faster with zero GPU on edge!

Installation

# Basic installation
pip install tlex-edge

# With server support (FastAPI)
pip install tlex-edge[server]

# With RAG support (ChromaDB)
pip install tlex-edge[rag]

# Full installation
pip install tlex-edge[full]

Quick Start

Python API

from tlex import TLexClient, EdgeDecoder

# Connect to remote GPU server
client = TLexClient("http://gpu-server:11434")

# Generate with 32B model - no local GPU needed!
result = client.generate(
    "Explain quantum computing",
    model="humotica-32b",
    max_tokens=100
)
print(result.text)
print(f"Speed: {result.tokens_per_second:.1f} tok/s")

# Streaming output
for chunk in client.stream("Tell me a story"):
    print(chunk, end="", flush=True)

Command Line

# Generate text
tlex generate "What is AI?" --model qwen2.5:7b --server http://gpu-server:11434

# Interactive chat
tlex chat --model humotica-32b

# List available models
tlex models

# Benchmark decoder
tlex benchmark --vocab qwen_vocab.db

Docker

# Build
docker build -t tlex-edge .

# Run
docker run -it tlex-edge generate "Hello!" --model qwen2.5:7b

# With docker-compose
docker-compose up -d
docker-compose exec tlex chat

Building Vocabulary Database

The vocab database is all an edge device needs to decode tokens:

# From command line
tlex vocab Qwen/Qwen2.5-7B-Instruct --output qwen_vocab.db

# From Python
from tlex import build_vocab_db
build_vocab_db("Qwen/Qwen2.5-7B-Instruct", "qwen_vocab.db")

Size comparison:

Qwen 7B model: ~14 GB
qwen_vocab.db: ~6 MB (2300x smaller!)

Server Setup

T-LEX works with any Ollama-compatible server:

# On your GPU server (P520, etc.)
ollama serve

# Pull models
ollama pull qwen2.5:7b
ollama pull qwen2.5:32b

Architecture

              ┌─────────────────────────────────────────────────┐
              │                  GPU SERVER                      │
              │  ┌─────────────────────────────────────────┐    │
              │  │  Ollama / OomLlama                      │    │
              │  │  - Qwen 7B/32B/72B                      │    │
              │  │  - LLaMA 70B                            │    │
              │  │  - Any GGUF model                       │    │
              │  └─────────────────────────────────────────┘    │
              │                      │                           │
              │                      │ HTTP Stream               │
              │                      │ (token chunks)            │
              └──────────────────────┼───────────────────────────┘
                                     │
                    ┌────────────────┴────────────────┐
                    │           NETWORK               │
                    │    (LAN / Internet / I-Poll)    │
                    └────────────────┬────────────────┘
                                     │
┌────────────────────────────────────┴────────────────────────────────────┐
│                           EDGE DEVICES                                   │
│                                                                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐ │
│  │   Laptop     │  │  Raspberry   │  │    Phone     │  │    Drone     │ │
│  │              │  │     Pi       │  │              │  │              │ │
│  │ vocab.db 6MB │  │ vocab.db 6MB │  │ vocab.db 6MB │  │ vocab.db 6MB │ │
│  │   NO GPU!    │  │   NO GPU!    │  │   NO GPU!    │  │   NO GPU!    │ │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘

Use Cases

IoT Devices: Smart home with AI, no cloud dependency
Drones: On-board AI decisions, low latency
Mobile Apps: Full LLM power without draining battery
Air-gapped Networks: Self-hosted inference + edge decode
Cost Reduction: One GPU server, unlimited edge clients

Part of HumoticaOS

T-LEX is part of the HumoticaOS ecosystem:

TIBET: Trust & provenance for AI actions
I-Poll: AI-to-AI messaging
OomLlama: Native Rust inference engine

One love, one fAmIly! 🦙❤️

License

MIT License - see LICENSE for details.

Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Feb 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tlex_edge-0.1.0.tar.gz (2.5 MB view details)

Uploaded Feb 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tlex_edge-0.1.0-py3-none-any.whl (13.6 kB view details)

Uploaded Feb 25, 2026 Python 3

File details

Details for the file tlex_edge-0.1.0.tar.gz.

File metadata

Download URL: tlex_edge-0.1.0.tar.gz
Upload date: Feb 25, 2026
Size: 2.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for tlex_edge-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ccbaabab57bf8bfda4c7ddaef99477ef250eeb93b48dc94b66d44cc9b6ecead8`
MD5	`bc2abf73cf1ac3d45087c0ad44189696`
BLAKE2b-256	`937b13a299a78801df506c28c20d7c742c27672b39299f32ee65d035276f86ef`

See more details on using hashes here.

File details

Details for the file tlex_edge-0.1.0-py3-none-any.whl.

File metadata

Download URL: tlex_edge-0.1.0-py3-none-any.whl
Upload date: Feb 25, 2026
Size: 13.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for tlex_edge-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`33770998d68871d62596f138a9a3eb7314033520afa6866c53d97a7e1e8bd8f2`
MD5	`6b5f9c23067c8daf80fe60437dea0c98`
BLAKE2b-256	`a613aeb3fee9025f9976771a7867241067d7db5dcaae91e5492c09802cf181c4`

See more details on using hashes here.

tlex-edge 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

T-LEX Edge

The Problem

The Solution: Split-Brain Architecture

Performance

Installation

Quick Start

Python API

Command Line

Docker

Building Vocabulary Database

Server Setup

Architecture

Use Cases

Part of HumoticaOS

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes