Skip to main content

Run massive models on minimal hardware

Project description

DeepNetz

Run massive models on minimal hardware.

pip install deepnetz

deepnetz run model.gguf                         # auto-detect hardware
deepnetz run model.gguf --cpu                    # CPU-only
deepnetz run model.gguf --gpu 8GB                # GPU with budget
deepnetz run ollama://qwen3.5:35b                # from Ollama
deepnetz run hf://unsloth/Qwen3.5-35B-A3B-GGUF  # from HuggingFace
deepnetz run lmstudio://qwen3.5-35b             # from LM Studio
deepnetz serve model.gguf --port 8080            # OpenAI-compatible API

What it does

One framework. 6 backends. Any model. Any hardware.

You have Without DeepNetz With DeepNetz
RTX 4060 8GB + 32GB RAM 8B model, 4K context 122B model, 32K context
32GB RAM, no GPU 7B model, 4K context 35B model, 8K context
RTX 3090 24GB + 64GB RAM 70B model, 8K context 122B model, 128K context

Quick start

pip install deepnetz

# Show your hardware + available backends
deepnetz hardware
deepnetz backends

# Run a model (auto-detects everything)
deepnetz run ./model.gguf

# Load from anywhere
deepnetz run ollama://qwen3.5:35b
deepnetz run hf://unsloth/Qwen3.5-35B-A3B-GGUF
deepnetz run lmstudio://qwen3.5-35b

# CPU-only / GPU budget
deepnetz run model.gguf --cpu
deepnetz run model.gguf --gpu 8GB --context 32k

# Single prompt
deepnetz run model.gguf -p "Explain gravity"

# API server with Web UI
deepnetz serve model.gguf --port 8080
# Dashboard: http://localhost:8080/
# Chat:      http://localhost:8080/chat
# Models:    http://localhost:8080/models
# API:       http://localhost:8080/v1/chat/completions

# Download models
deepnetz download Qwen3.5-35B --quant Q4_K_M

Python API

from deepnetz import Model

# Auto everything
model = Model("model.gguf")
response = model.chat("Hello!")

# CPU-only
model = Model("model.gguf", cpu_only=True)

# Specific backend
model = Model("model.gguf", backend="ollama")

# Streaming
for token in model.stream("Tell me a story"):
    print(token, end="", flush=True)

6 Backends

DeepNetz auto-detects which backends are installed and uses the best one:

Backend Source How it connects
Native llama-cpp-python Direct GGUF inference (fastest)
Ollama Ollama REST API localhost:11434
vLLM vLLM Python/CLI vllm serve or running instance
LM Studio lms CLI / REST localhost:1234
HuggingFace transformers Pipeline (safetensors only)
Remote Any OpenAI API Custom endpoint
deepnetz backends   # shows what's available on your system

KV Cache Optimization

DeepNetz stacks compression techniques for up to 10x memory reduction:

122B model, 32K context:
  KV Cache (naive):        ~16 GB → doesn't fit
  + TurboQuant (3.6x):       4.4 GB
  + Token Eviction (2x):     2.2 GB
  + KV Merging (1.5x):       1.5 GB → fits!
Technique Based on Effect
TurboQuant Google, ICLR 2026 3.6x KV compression
Attention Sinks StreamingLLM Fixed memory for infinite context
Token Eviction PagedEviction Remove unimportant tokens
KV Merging CaM / D2O Merge similar tokens

Web UI

deepnetz serve model.gguf starts a web dashboard at http://localhost:8080/:

  • Dashboard — Live CPU, RAM, GPU, VRAM, temperature monitoring
  • Chat — Streaming chat interface
  • Models — Browse and manage models from all backends

Tool Calling

Built-in internet search, extensible tool framework:

from deepnetz.tools.registry import ToolRegistry

registry = ToolRegistry()  # web_search built-in
result = registry.execute("web_search", {"query": "latest news"})

OpenAI-compatible function calling via /v1/chat/completions.

Benchmarks

Tested on 9 models from 3B to 122B on RTX 4060 (8GB) + 32GB RAM:

Model PPL Delta Speed KV Compression
Llama-3.2-3B +0.4% 3.6x
Gemma-3-27B +2.0% 2.3 tok/s 3.6x
Qwen3.5-35B +2.7% 7.4 tok/s 3.6x
Llama-3.3-70B 0.7 tok/s
Qwen3.5-122B 1.3 tok/s

Architecture

deepnetz/
├── __init__.py                  # from deepnetz import Model
├── cli.py                       # CLI (run/serve/info/hardware/backends/download)
├── server.py                    # FastAPI + WebSocket + OpenAI API
├── errors.py                    # Error hierarchy
├── engine/
│   ├── model.py                 # Main orchestrator
│   ├── hardware.py              # GPU/CPU/RAM detection
│   ├── monitor.py               # Real-time system stats
│   ├── planner.py               # Budget → inference plan
│   ├── gguf_reader.py           # GGUF metadata extraction
│   ├── resolver.py              # Universal model resolver (8 sources)
│   ├── downloader.py            # HuggingFace download
│   ├── scanner.py               # Local model discovery
│   ├── session.py               # SQLite conversation persistence
│   └── evaluator.py             # Output quality scoring
├── backends/
│   ├── base.py                  # Adapter interface
│   ├── native.py                # llama-cpp-python
│   ├── ollama.py                # Ollama REST API
│   ├── vllm.py                  # vLLM
│   ├── lmstudio.py              # LM Studio
│   ├── huggingface.py           # transformers
│   ├── remote.py                # Any OpenAI API
│   └── discovery.py             # Auto-detect backends
├── cache/
│   ├── turboquant.py            # TurboQuant KV compression
│   ├── eviction.py              # Attention sink eviction
│   └── merging.py               # KV entry merging
├── tools/
│   ├── base.py                  # Tool protocol
│   ├── search.py                # Web search (DuckDuckGo)
│   └── registry.py              # Tool management + parser
└── ui/
    ├── routes.py                # Web UI routes
    ├── static/                  # JS, CSS
    └── templates/               # Dashboard, Chat, Models HTML

What makes it different

Feature Ollama LM Studio vLLM DeepNetz
Load from anywhere Own registry Own catalog HuggingFace All of them
KV Cache Compression No No No TurboQuant 3.6x
Multi-Backend No No No 6 backends
Hardware Auto-Tuning Basic Basic No Budget planner
Web UI + Monitoring No Yes (closed) No Yes
Tool Calling No No Yes Yes + Search
CPU Optimized Yes Yes No Yes + KV compression
Quality Scoring No No No Yes

Author

Keyvan Hardanikeyvan.ai | deepnetz.com | GitHub | LinkedIn

Contributing

PRs welcome! See open issues.

git clone https://github.com/Keyvanhardani/deepnetz.git
cd deepnetz
pip install -e ".[server]"
pytest tests/

License

MIT — use it, fork it, build on it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

deepnetz-1.0.3-cp312-cp312-win_amd64.whl (3.4 MB view details)

Uploaded CPython 3.12Windows x86-64

deepnetz-1.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (6.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

deepnetz-1.0.3-cp312-cp312-macosx_10_13_universal2.whl (2.1 MB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

deepnetz-1.0.3-cp311-cp311-win_amd64.whl (3.4 MB view details)

Uploaded CPython 3.11Windows x86-64

deepnetz-1.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (5.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

deepnetz-1.0.3-cp311-cp311-macosx_10_9_universal2.whl (2.1 MB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

deepnetz-1.0.3-cp310-cp310-win_amd64.whl (3.4 MB view details)

Uploaded CPython 3.10Windows x86-64

deepnetz-1.0.3-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (5.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

deepnetz-1.0.3-cp310-cp310-macosx_10_9_universal2.whl (2.1 MB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file deepnetz-1.0.3-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: deepnetz-1.0.3-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 3.4 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deepnetz-1.0.3-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 525490456172ff9e2ec105cf15010e03cc9f547e248ebafc06d54376e63e1bdb
MD5 ec6ae89b97603eec6795b763c8e21945
BLAKE2b-256 d73377f2888166c5717f0202933de2d6cdc226af23906e9e594588cd7584601e

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.3-cp312-cp312-win_amd64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 ba3142df9cd9c1c49fb1c134127cd905aace9ae98adf332dfed00527c8bea5dc
MD5 130b31f837184f78e7a22a783c421a04
BLAKE2b-256 b03612aa80749a0d3420b831b024d17f5d9b9448d908b37398340e126472c962

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.3-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.3-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.3-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 46774588f40a574160ffd0f7a2821739298834a8fa099206443a7651cd23dd0c
MD5 41a53fe52c69aa09faf2562b5acbf0da
BLAKE2b-256 4e2a4f70cd9b8c82ebf0c17a399ed83269f1c96028a8538861e499d856b79ad0

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.3-cp312-cp312-macosx_10_13_universal2.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.3-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: deepnetz-1.0.3-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 3.4 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deepnetz-1.0.3-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 108b7eb0b9259cb6e23fe51c573093025618b1a2341559ab7b073905d04bcb5c
MD5 5b868146a5209f85d47d040379c443bd
BLAKE2b-256 4aff09222d889f904bbc4210a11762e352e569ec5b1a15bc66d8a260fc4c8a1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.3-cp311-cp311-win_amd64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 01e7d347a296a575115a55acf0e798c05bcd5a9d07be49f7c3e9484f3b57e22f
MD5 63cc3649363a5f41c566c66ed0bd3283
BLAKE2b-256 f6f4ea723398a04e815fd0cbdfe60403c2cfc76a809a716f5aa3122327129ef4

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.3-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.3-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.3-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 9b4047b8433bc5d32629f209cef29692bccffc6c58d96c00c4d09bbc38f8d3fa
MD5 bf53c07ba4e67ee8885f9ed83265adcd
BLAKE2b-256 9492cc3b853f9b8229dc8fdb21a4adcb7db64ef6d3ff2864a84e162150555ac9

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.3-cp311-cp311-macosx_10_9_universal2.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.3-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: deepnetz-1.0.3-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 3.4 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deepnetz-1.0.3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 e372b235602680523782bc753d6047ebcb36646df1080234c9963e1456866223
MD5 100c7e75cfa0160db4c5e477a7741fbb
BLAKE2b-256 f4440cec57dc815ec95f8f96675dfc744b3fd39e0884ea8e156f50133c20b207

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.3-cp310-cp310-win_amd64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.3-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.3-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 c138438372f92b62856a151b726d76f91828c56f396bfd166c0e573fd27a3431
MD5 59a7e3080aaa94bdb9544dfa702736d5
BLAKE2b-256 e0f5b0c99b02cf6a9b707473e0438587fa84199d8ed638795d294514aa29c7f0

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.3-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.3-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.3-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 246acb08680f9e6e4d115ca38a1f90e665a64588545de110373f2c39c528057d
MD5 ebd8228cdd551ce65407b195eaa88579
BLAKE2b-256 8c060c6b5d89d824a0d7e2d36741a0d9e8aa66f2a42383d82236b1f357f4e352

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.3-cp310-cp310-macosx_10_9_universal2.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page