Skip to main content

T-LEX: Edge-Decoupled LLM Inference - Run 32B models on ANY device!

Project description

T-LEX Edge

Edge-Decoupled LLM Inference - Run 32B+ models on ANY device!

PyPI version License: MIT

The Problem

Running large language models locally requires expensive GPU hardware:

Model VRAM Required Cost
7B ~14GB $500+
32B ~24GB $1000+
70B ~40GB $2000+

Most edge devices (laptops, IoT, drones, phones) can't run these models.

The Solution: Split-Brain Architecture

T-LEX separates inference (GPU server) from decoding (edge device):

┌─────────────────────────┐         ┌─────────────────────────┐
│     GPU Server          │         │     Edge Device         │
│  ┌─────────────────┐    │ tokens  │  ┌─────────────────┐    │
│  │ Ollama/OomLlama │────┼────────►│  │ T-LEX Decoder   │    │
│  │ 32B model       │    │         │  │ vocab.db (6MB)  │    │
│  │ 24GB VRAM       │    │         │  │ NO GPU!         │    │
│  └─────────────────┘    │         │  └─────────────────┘    │
└─────────────────────────┘         └─────────────────────────┘

Performance

Metric Local 32B T-LEX (Remote 32B + Edge Decode)
GPU Required (Edge) 24GB VRAM None!
Edge Storage 20GB+ 6 MB
Generation Speed 2 tok/s 16 tok/s
Decode Speed N/A 45,000 tok/s

Result: 8x faster with zero GPU on edge!

Installation

# Basic installation
pip install tlex-edge

# With server support (FastAPI)
pip install tlex-edge[server]

# With RAG support (ChromaDB)
pip install tlex-edge[rag]

# Full installation
pip install tlex-edge[full]

Quick Start

Python API

from tlex import TLexClient, EdgeDecoder

# Connect to remote GPU server
client = TLexClient("http://gpu-server:11434")

# Generate with 32B model - no local GPU needed!
result = client.generate(
    "Explain quantum computing",
    model="humotica-32b",
    max_tokens=100
)
print(result.text)
print(f"Speed: {result.tokens_per_second:.1f} tok/s")

# Streaming output
for chunk in client.stream("Tell me a story"):
    print(chunk, end="", flush=True)

Command Line

# Generate text
tlex generate "What is AI?" --model qwen2.5:7b --server http://gpu-server:11434

# Interactive chat
tlex chat --model humotica-32b

# List available models
tlex models

# Benchmark decoder
tlex benchmark --vocab qwen_vocab.db

Docker

# Build
docker build -t tlex-edge .

# Run
docker run -it tlex-edge generate "Hello!" --model qwen2.5:7b

# With docker-compose
docker-compose up -d
docker-compose exec tlex chat

Building Vocabulary Database

The vocab database is all an edge device needs to decode tokens:

# From command line
tlex vocab Qwen/Qwen2.5-7B-Instruct --output qwen_vocab.db

# From Python
from tlex import build_vocab_db
build_vocab_db("Qwen/Qwen2.5-7B-Instruct", "qwen_vocab.db")

Size comparison:

  • Qwen 7B model: ~14 GB
  • qwen_vocab.db: ~6 MB (2300x smaller!)

Server Setup

T-LEX works with any Ollama-compatible server:

# On your GPU server (P520, etc.)
ollama serve

# Pull models
ollama pull qwen2.5:7b
ollama pull qwen2.5:32b

Architecture

              ┌─────────────────────────────────────────────────┐
              │                  GPU SERVER                      │
              │  ┌─────────────────────────────────────────┐    │
              │  │  Ollama / OomLlama                      │    │
              │  │  - Qwen 7B/32B/72B                      │    │
              │  │  - LLaMA 70B                            │    │
              │  │  - Any GGUF model                       │    │
              │  └─────────────────────────────────────────┘    │
              │                      │                           │
              │                      │ HTTP Stream               │
              │                      │ (token chunks)            │
              └──────────────────────┼───────────────────────────┘
                                     │
                    ┌────────────────┴────────────────┐
                    │           NETWORK               │
                    │    (LAN / Internet / I-Poll)    │
                    └────────────────┬────────────────┘
                                     │
┌────────────────────────────────────┴────────────────────────────────────┐
│                           EDGE DEVICES                                   │
│                                                                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐ │
│  │   Laptop     │  │  Raspberry   │  │    Phone     │  │    Drone     │ │
│  │              │  │     Pi       │  │              │  │              │ │
│  │ vocab.db 6MB │  │ vocab.db 6MB │  │ vocab.db 6MB │  │ vocab.db 6MB │ │
│  │   NO GPU!    │  │   NO GPU!    │  │   NO GPU!    │  │   NO GPU!    │ │
│  └──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘

Use Cases

  • IoT Devices: Smart home with AI, no cloud dependency
  • Drones: On-board AI decisions, low latency
  • Mobile Apps: Full LLM power without draining battery
  • Air-gapped Networks: Self-hosted inference + edge decode
  • Cost Reduction: One GPU server, unlimited edge clients

Part of HumoticaOS

T-LEX is part of the HumoticaOS ecosystem:

  • TIBET: Trust & provenance for AI actions
  • I-Poll: AI-to-AI messaging
  • OomLlama: Native Rust inference engine

One love, one fAmIly! 🦙❤️

License

MIT License - see LICENSE for details.

Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tlex_edge-0.1.0.tar.gz (2.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tlex_edge-0.1.0-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file tlex_edge-0.1.0.tar.gz.

File metadata

  • Download URL: tlex_edge-0.1.0.tar.gz
  • Upload date:
  • Size: 2.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for tlex_edge-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ccbaabab57bf8bfda4c7ddaef99477ef250eeb93b48dc94b66d44cc9b6ecead8
MD5 bc2abf73cf1ac3d45087c0ad44189696
BLAKE2b-256 937b13a299a78801df506c28c20d7c742c27672b39299f32ee65d035276f86ef

See more details on using hashes here.

File details

Details for the file tlex_edge-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tlex_edge-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for tlex_edge-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 33770998d68871d62596f138a9a3eb7314033520afa6866c53d97a7e1e8bd8f2
MD5 6b5f9c23067c8daf80fe60437dea0c98
BLAKE2b-256 a613aeb3fee9025f9976771a7867241067d7db5dcaae91e5492c09802cf181c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page