UARC - Unified Adaptive Runtime Core: Production AI inference engine

These details have not been verified by PyPI

Project links

Project description

UARC — Unified Adaptive Runtime Core

Next-generation AI inference engine: 7 adaptive modules, portable, OpenAI-compatible.

pip install uarc

Modules

#	Module	Purpose	Key Algorithm
M1	TDE — Token Difficulty Estimator	Per-token compute routing	4-layer MLP → perplexity → draft/partial/full
M2	AI-VM — Virtual Memory Manager	3-tier memory (VRAM→RAM→NVMe)	CLOCK-Pro eviction + KV page table
M3	DPE — Dynamic Precision Engine	Per-layer bit-width allocation	Greedy knapsack (INT4→INT8→FP16→FP32)
M4	PLL — Predictive Layer Loader	Lookahead prefetch scheduling	EMA timing + DAG deadline analysis
M5	ACS — Adaptive Compute Scheduler	Priority batch formation + device routing	3-heap priority queue + roofline routing
M6	NSC — Neural Semantic Cache	Embedding-based prompt deduplication	HNSW ANN + bi-encoder + adaptive threshold
M7	Router — Hybrid CPU/GPU	Arithmetic-intensity routing	Roofline model + pipeline parallelism

Quick Start

1. Single Inference (CLI)

uarc run "Explain quantum computing in simple terms"
uarc run "Write Python code to sort a list" --max-tokens 256 --stream
uarc run "Quick question" --priority realtime --json

2. Inference Server (OpenAI-compatible)

uarc serve --port 8000 --vram 8 --ram 32

# Then call it like any OpenAI API:
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "uarc-sim-7b",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

3. Benchmark

uarc bench --requests 100 --vram 8 --seed-cache

4. Python API

from uarc import UARCRuntime, UARCConfig, InferenceRequest

cfg = UARCConfig.from_env()   # reads UARC_VRAM_GB, UARC_RAM_GB, etc.
rt  = UARCRuntime(cfg)
rt.start()

resp = rt.infer(InferenceRequest(
    request_id="req-001",
    prompt="What is machine learning?",
    max_new_tokens=256,
))

print(resp.text)
print(f"Route: {resp.route_taken}")          # draft / partial / full / cache
print(f"Latency: {resp.latency_ms:.1f}ms")
print(f"Compute saved: {resp.compute_saved_pct:.0f}%")

rt.stop()

5. Streaming

for chunk in rt.infer_stream(request):
    print(chunk, end="", flush=True)

6. Batch

responses = rt.infer_batch([req1, req2, req3])

Configuration

from uarc import UARCConfig

# Presets
cfg = UARCConfig.for_edge()          # 4GB RAM, no GPU
cfg = UARCConfig.for_gpu(vram_gb=24) # 24GB VRAM

# From environment variables
# UARC_VRAM_GB, UARC_RAM_GB, UARC_MAX_BATCH, UARC_PORT,
# UARC_NSC_THRESHOLD, UARC_DEVICE
cfg = UARCConfig.from_env()

# Manual
cfg = UARCConfig()
cfg.aivm.vram_mb = 8192         # 8 GB VRAM
cfg.tde.tau_easy = 2.5          # PPL threshold: easy tokens
cfg.tde.tau_hard = 8.0          # PPL threshold: hard tokens
cfg.nsc.similarity_threshold = 0.92
cfg.acs.max_batch_size = 32

Server Endpoints

Method	Path	Description
POST	`/v1/chat/completions`	OpenAI chat (streaming supported)
POST	`/v1/completions`	OpenAI completion (streaming supported)
GET	`/v1/models`	Model list
GET	`/health`	Health check
GET	`/status`	Full module status JSON
GET	`/metrics`	Prometheus metrics
POST	`/admin/cache/clear`	Flush NSC cache
POST	`/admin/evict`	Trigger memory eviction

Architecture

InferenceRequest
     │
     ▼
  M6 NSC ──── hit ────────────────────────────► Response (cache)
     │ miss
     ▼
  M1 TDE ──── estimate PPL ──────► route: draft / partial / full
     │
     ▼
  M5 ACS ──── form batch ─────► prefix-sharing reorder
     │
     ▼
  M3 DPE ──── precision plan ──► INT4/INT8/FP16/FP32 per layer
     │
     ▼
  M4 PLL ──── prefetch ────────► async layer loading from NVMe
     │
     ▼
  M2 AI-VM ── VRAM/RAM/NVMe ──► CLOCK-Pro eviction
     │
     ▼
  M7 Router ─ CPU or GPU? ────► intensity < 4 → CPU, else GPU
     │
     ▼
  Model forward pass
     │
     ▼
  M6 NSC store
     │
     ▼
  Response

Development

git clone <repo>
cd uarc
pip install -e ".[dev]"
pytest tests/ -v

Performance Targets

Scenario	Baseline	UARC Target
70B on 8GB VRAM	OOM	85 tok/s
70B on 24GB GPU	12 tok/s	38 tok/s (+217%)
TTFT	1.0×	0.38× (−62%)
Concurrent requests	4	18–32
Compute per token	1.0×	0.30–0.70×

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0

Mar 11, 2026

This version

0.2.0

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uarc-0.2.0.tar.gz (39.3 kB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

uarc-0.2.0-py3-none-any.whl (40.1 kB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file uarc-0.2.0.tar.gz.

File metadata

Download URL: uarc-0.2.0.tar.gz
Upload date: Mar 9, 2026
Size: 39.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for uarc-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`dba91f8f3cada4f6f3ec5d82a3da4f66db2feb75fc81d0850c34c7ff54db3989`
MD5	`94cc667fd4b251d988ca5a44402e7256`
BLAKE2b-256	`955f335853d31ecee529e330a5111dbeb660a88c53168f07f8b3a5750ee74835`

See more details on using hashes here.

File details

Details for the file uarc-0.2.0-py3-none-any.whl.

File metadata

Download URL: uarc-0.2.0-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 40.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for uarc-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3301945d36bdeb00c61b4e9b649641bfa70290825b71714933bd5f3bf390e95b`
MD5	`7bb329579dfe9502585b0b221b053552`
BLAKE2b-256	`ac2b2051936646528b9a1903bc25b8e0f783870f7fb999fe41b2f69ce14490b3`

See more details on using hashes here.

uarc 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

UARC — Unified Adaptive Runtime Core

Modules

Quick Start

1. Single Inference (CLI)

2. Inference Server (OpenAI-compatible)

3. Benchmark

4. Python API

5. Streaming

6. Batch

Configuration

Server Endpoints

Architecture

Development

Performance Targets

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes