Skip to main content

Run massive models on minimal hardware

Project description

DeepNetz

Run massive models on minimal hardware.

pip install deepnetz

deepnetz run model.gguf                         # auto-detect hardware
deepnetz run model.gguf --cpu                    # CPU-only
deepnetz run model.gguf --gpu 8GB                # GPU with budget
deepnetz run ollama://qwen3.5:35b                # from Ollama
deepnetz run hf://unsloth/Qwen3.5-35B-A3B-GGUF  # from HuggingFace
deepnetz run lmstudio://qwen3.5-35b             # from LM Studio
deepnetz serve model.gguf --port 8080            # OpenAI-compatible API

What it does

One framework. 6 backends. Any model. Any hardware.

You have Typical setup With DeepNetz optimization
RTX 4060 8GB + 32GB RAM 35B model via Ollama Same model, 3.6x less KV cache, longer context
32GB RAM, no GPU 7B model, slow Auto-optimized CPU inference + KV compression
RTX 3090 24GB + 64GB RAM 70B model Same model, optimized layer split + cache

Quick start

pip install deepnetz

# Show your hardware + available backends
deepnetz hardware
deepnetz backends

# Run a model (auto-detects everything)
deepnetz run ./model.gguf

# Load from anywhere
deepnetz run ollama://qwen3.5:35b
deepnetz run hf://unsloth/Qwen3.5-35B-A3B-GGUF
deepnetz run lmstudio://qwen3.5-35b

# CPU-only / GPU budget
deepnetz run model.gguf --cpu
deepnetz run model.gguf --gpu 8GB --context 32k

# Single prompt
deepnetz run model.gguf -p "Explain gravity"

# API server with Web UI
deepnetz serve model.gguf --port 8080
# Dashboard: http://localhost:8080/
# Chat:      http://localhost:8080/chat
# Models:    http://localhost:8080/models
# API:       http://localhost:8080/v1/chat/completions

# Download models
deepnetz download Qwen3.5-35B --quant Q4_K_M

Python API

from deepnetz import Model

# Auto everything
model = Model("model.gguf")
response = model.chat("Hello!")

# CPU-only
model = Model("model.gguf", cpu_only=True)

# Specific backend
model = Model("model.gguf", backend="ollama")

# Streaming
for token in model.stream("Tell me a story"):
    print(token, end="", flush=True)

6 Backends

DeepNetz auto-detects which backends are installed and uses the best one:

Backend Source How it connects
Native llama-cpp-python Direct GGUF inference (fastest)
Ollama Ollama REST API localhost:11434
vLLM vLLM Python/CLI vllm serve or running instance
LM Studio lms CLI / REST localhost:1234
HuggingFace transformers Pipeline (safetensors only)
Remote Any OpenAI API Custom endpoint
deepnetz backends   # shows what's available on your system

KV Cache Optimization

DeepNetz stacks compression techniques for up to 10x memory reduction:

122B model, 32K context:
  KV Cache (naive):        ~16 GB → doesn't fit
  + TurboQuant (3.6x):       4.4 GB
  + Token Eviction (2x):     2.2 GB
  + KV Merging (1.5x):       1.5 GB → fits!
Technique Based on Effect
TurboQuant Google, ICLR 2026 3.6x KV compression
Attention Sinks StreamingLLM Fixed memory for infinite context
Token Eviction PagedEviction Remove unimportant tokens
KV Merging CaM / D2O Merge similar tokens

Web UI

deepnetz serve model.gguf starts a web dashboard at http://localhost:8080/:

  • Dashboard — Live CPU, RAM, GPU, VRAM, temperature monitoring
  • Chat — Streaming chat interface
  • Models — Browse and manage models from all backends

Tool Calling

Built-in internet search, extensible tool framework:

from deepnetz.tools.registry import ToolRegistry

registry = ToolRegistry()  # web_search built-in
result = registry.execute("web_search", {"query": "latest news"})

OpenAI-compatible function calling via /v1/chat/completions.

Benchmarks

Tested on 9 models from 3B to 122B on RTX 4060 (8GB) + 32GB RAM:

Model PPL Delta Speed KV Compression
Llama-3.2-3B +0.4% 3.6x
Gemma-3-27B +2.0% 2.3 tok/s 3.6x
Qwen3.5-35B +2.7% 7.4 tok/s 3.6x
Llama-3.3-70B 0.7 tok/s
Qwen3.5-122B 1.3 tok/s

Architecture

deepnetz/
├── __init__.py                  # from deepnetz import Model
├── cli.py                       # CLI (run/serve/info/hardware/backends/download)
├── server.py                    # FastAPI + WebSocket + OpenAI API
├── errors.py                    # Error hierarchy
├── engine/
│   ├── model.py                 # Main orchestrator
│   ├── hardware.py              # GPU/CPU/RAM detection
│   ├── monitor.py               # Real-time system stats
│   ├── planner.py               # Budget → inference plan
│   ├── gguf_reader.py           # GGUF metadata extraction
│   ├── resolver.py              # Universal model resolver (8 sources)
│   ├── downloader.py            # HuggingFace download
│   ├── scanner.py               # Local model discovery
│   ├── session.py               # SQLite conversation persistence
│   └── evaluator.py             # Output quality scoring
├── backends/
│   ├── base.py                  # Adapter interface
│   ├── native.py                # llama-cpp-python
│   ├── ollama.py                # Ollama REST API
│   ├── vllm.py                  # vLLM
│   ├── lmstudio.py              # LM Studio
│   ├── huggingface.py           # transformers
│   ├── remote.py                # Any OpenAI API
│   └── discovery.py             # Auto-detect backends
├── cache/
│   ├── turboquant.py            # TurboQuant KV compression
│   ├── eviction.py              # Attention sink eviction
│   └── merging.py               # KV entry merging
├── tools/
│   ├── base.py                  # Tool protocol
│   ├── search.py                # Web search (DuckDuckGo)
│   └── registry.py              # Tool management + parser
└── ui/
    ├── routes.py                # Web UI routes
    ├── static/                  # JS, CSS
    └── templates/               # Dashboard, Chat, Models HTML

What makes it different

Feature Ollama LM Studio vLLM DeepNetz
Load from anywhere Own registry Own catalog HuggingFace All of them
KV Cache Compression No No No TurboQuant 3.6x
Multi-Backend No No No 6 backends
Hardware Auto-Tuning Basic Basic No Budget planner
Web UI + Monitoring No Yes (closed) No Yes
Tool Calling No No Yes Yes + Search
CPU Optimized Yes Yes No Yes + KV compression
Quality Scoring No No No Yes

Author

Keyvan Hardanikeyvan.ai | deepnetz.com | GitHub | LinkedIn

Contributing

PRs welcome! See open issues.

git clone https://github.com/Keyvanhardani/deepnetz.git
cd deepnetz
pip install -e ".[server]"
pytest tests/

License

MIT — use it, fork it, build on it.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

deepnetz-1.0.4-cp312-cp312-win_amd64.whl (3.4 MB view details)

Uploaded CPython 3.12Windows x86-64

deepnetz-1.0.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (6.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

deepnetz-1.0.4-cp312-cp312-macosx_10_13_universal2.whl (2.1 MB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

deepnetz-1.0.4-cp311-cp311-win_amd64.whl (3.4 MB view details)

Uploaded CPython 3.11Windows x86-64

deepnetz-1.0.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (5.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

deepnetz-1.0.4-cp311-cp311-macosx_10_9_universal2.whl (2.1 MB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

deepnetz-1.0.4-cp310-cp310-win_amd64.whl (3.4 MB view details)

Uploaded CPython 3.10Windows x86-64

deepnetz-1.0.4-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (5.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

deepnetz-1.0.4-cp310-cp310-macosx_10_9_universal2.whl (2.1 MB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file deepnetz-1.0.4-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: deepnetz-1.0.4-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 3.4 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deepnetz-1.0.4-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 0b23d71fe36cb843c84fe0c1e5eeb428b8cc9d60b0359cb0c9aba603c1c8edec
MD5 d187d3c98c752f48c4092f61bf66df6a
BLAKE2b-256 65448607eddde62877edd6c1c9c1e58bf201979b9cb596a8d8477e660d1bcb04

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.4-cp312-cp312-win_amd64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 d3ef37e3d5ffa4010edfe02dd7068104f3769bcf51fee0f6fa8685775565fe49
MD5 f823cfcd67939409125bf744452c150d
BLAKE2b-256 17b5ed7b5cd88e835f5346509a5352ee16524b1071d637c41ccdd1485fdca7a8

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.4-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.4-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.4-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 3cdf7f9a4a4b202e2e49ae93f017d685dea53ff240bec9f35cbd76597eaad719
MD5 60be6eb48cac00ed01ddbb94406880dc
BLAKE2b-256 f05166d68a4e72265ad78e352969379a77608222b5fe6df3bf714588e66e6530

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.4-cp312-cp312-macosx_10_13_universal2.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.4-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: deepnetz-1.0.4-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 3.4 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deepnetz-1.0.4-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 53b4771494c30f06882e46832aeac50e80e21be45e5e3300ab5cfbb186f179a3
MD5 988595ba4f008e2856cc5653cd187dd7
BLAKE2b-256 9cc1674b56e254fa545150d74735e05554a70ba08f91f9f698cf7a3df245a243

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.4-cp311-cp311-win_amd64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 c66fcf3618a0e50fc4ad09e10ff4bfe76c7d66e1ae383f171624fc8dbd0975f7
MD5 e222112435f8b9c6dd2d21bea94017fb
BLAKE2b-256 f842f4070a726d2a30a93ac92f171bfab3601fe4f445bfdc4fee19ae67cd2826

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.4-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.4-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.4-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 b7dd4d1bdc9d994a72439a90eb64a34fc5b4ae4fd4c692ec0903e36501ce1967
MD5 406d22bce23869ad321e93e51fb653dc
BLAKE2b-256 f1dfbd117a069964e601229afe2721f014b2d1585b0ec7779b5d235c283cde30

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.4-cp311-cp311-macosx_10_9_universal2.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.4-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: deepnetz-1.0.4-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 3.4 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deepnetz-1.0.4-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 bf192ea80c0ea81f74bbd6b7750e5f0490ceabbac6e368a8e9ab4ef5b6c03cf7
MD5 6cb9f6b85bba33f658e41ee7eb8379ae
BLAKE2b-256 42d84bf2a31d3dffcac321e6fce5c5529953e07b889a1a59e14744c6f253479e

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.4-cp310-cp310-win_amd64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.4-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.4-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 91caabc954564f04159370dedcddf5b0c66910194e552cff39087b5c64637463
MD5 d3bbb5533096c4ba72e589fe84cb036b
BLAKE2b-256 3658af5801f4b7b22c382126add3d949c42b7bb525eff0557c48ad8ad021a439

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.4-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.4-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.4-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 ba579c13f58b9af872453c4cb67194dab0fb486bf5ac10531bfe1121ecf63353
MD5 508d960a9c0613373abe1d191d068d06
BLAKE2b-256 3200a819cfb9a1edc96090555672d2ce4ff50613282c6bf76e907f1c8a525af5

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.4-cp310-cp310-macosx_10_9_universal2.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page