Skip to main content

Run massive models on minimal hardware

Project description

DeepNetz

Run massive models on minimal hardware.

pip install deepnetz

deepnetz run model.gguf                         # auto-detect hardware
deepnetz run model.gguf --cpu                    # CPU-only
deepnetz run model.gguf --gpu 8GB                # GPU with budget
deepnetz run ollama://qwen3.5:35b                # load from Ollama
deepnetz run hf://unsloth/Qwen3.5-35B-A3B-GGUF  # load from HuggingFace
deepnetz serve model.gguf --port 8080            # OpenAI-compatible API

DeepNetz combines cutting-edge research into one framework that makes large language models run on consumer hardware — no A100 required.

Quick start

# Install
pip install deepnetz

# Check your hardware
deepnetz hardware

# Local GGUF file
deepnetz run ./model.gguf

# Load from Ollama (reads from ~/.ollama/models/)
deepnetz run ollama://qwen3.5:35b

# Load from HuggingFace (auto-downloads)
deepnetz run hf://unsloth/Qwen3.5-35B-A3B-GGUF

# Load from LM Studio cache
deepnetz run lmstudio://qwen3.5-35b

# CPU-only / GPU with budget
deepnetz run model.gguf --cpu
deepnetz run model.gguf --gpu 8GB --context 32k

# Interactive chat
deepnetz run model.gguf
#   You: What is quantum computing?
#   AI:  Quantum computing uses quantum mechanics to...

# Single prompt
deepnetz run model.gguf -p "Explain gravity in one sentence"

# OpenAI-compatible API server
deepnetz serve model.gguf --port 8080
# Then: curl http://localhost:8080/v1/chat/completions ...

# Download from HuggingFace
deepnetz download Qwen3.5-35B --quant Q4_K_M

Python API

from deepnetz import Model

# Auto-detect hardware, optimize automatically
model = Model("model.gguf")
response = model.chat("Hello!")

# CPU-only with custom context
model = Model("model.gguf", cpu_only=True, target_context=8192)

# GPU with budget
model = Model("model.gguf", gpu_budget="8GB", ram_budget="32GB")

# Streaming
for token in model.stream("Tell me a story"):
    print(token, end="", flush=True)

What it does

You have Without DeepNetz With DeepNetz
RTX 4060 8GB + 32GB RAM 8B model, 4K context 122B model, 32K context
32GB RAM, no GPU 7B model, 4K context 35B model, 8K context
RTX 3090 24GB + 64GB RAM 70B model, 8K context 122B model, 128K context

How it works

DeepNetz auto-detects your hardware, reads model metadata, and computes an optimal inference plan:

$ deepnetz info Qwen3.5-122B-A10B-IQ2_XXS.gguf --gpu 8GB

  DeepNetz Hardware Profile
  ────────────────────────────────────────
  OS:       Linux
  CPU:      16 cores
  RAM:      31 GB
  GPU 0:    NVIDIA GeForce RTX 4060 (8188 MB)

  Model: Qwen3.5-122B-A10B
  ────────────────────────────────────────
  Parameters:  ~122B (MoE, 10B active)
  Layers:      96
  Heads:       64 Q / 4 KV
  Head dim:    128
  Context:     262,144
  File size:   34.1 GB

  DeepNetz Inference Plan
  ──────────────────────────────────────────────────
  Layers:     0 GPU + 96 CPU
  KV Cache:   K=turbo4_0, V=turbo4_0 (compressed)
  Context:    4,096 tokens
  Memory:     ~34.2 GB total
  Est. Speed: ~1.3 tok/s generation

The optimization stack

DeepNetz stacks multiple techniques. Each gives 2-4x savings. Combined, they multiply:

122B model, 32K context:

KV Cache (naive):        ~16 GB  → doesn't fit
  + TurboQuant (3.6x):     4.4 GB
  + Token Eviction (2x):   2.2 GB
  + KV Merging (1.5x):     1.5 GB  → fits!
Layer Technique Based on Status
Cache Compression TurboQuant (WHT + Lloyd-Max) Google, ICLR 2026 Implemented
Smart Offload Dynamic GPU/CPU layer split Q-Infer Implemented
Token Eviction Attention-aware pruning PagedEviction, EACL 2026 Planned
Attention Sinks Keep first + recent tokens StreamingLLM Planned
KV Merging Merge similar tokens CaM / D2O Planned
Multi-Tier Cache Important tokens = high precision KVC-Q Planned

What makes it different

Ollama / LMStudio: Load model, hope it fits. No KV optimization, no smart offloading.

vLLM / SGLang: Server-focused, needs beefy GPUs, not for your laptop.

DeepNetz: One command. Detects your hardware, picks the right optimizations, runs the model. CPU and GPU. Consumer-first.

Benchmarks

Tested on 9 models from 3B to 122B on RTX 4060 (8GB) + 32GB RAM:

Model f16 PPL turbo4_0 PPL Delta Generation
Llama-3.2-3B Q4_K_M 9.77 9.82 +0.4%
Qwen3-4B Q4_K_M 17.78 16.61 -6.6%
Gemma-3-27B Q2_K 8.53 8.70 +2.0% 2.3 tok/s
Qwen3.5-35B-A3B Q4_K_XL 5.91 6.07 +2.7% 7.4 tok/s
Llama-3.3-70B IQ2_M 4.91 0.7 tok/s
Qwen3.5-122B-A10B IQ2_XXS 1.3 tok/s

Full benchmark data + TurboQuant standalone library

Architecture

deepnetz/
├── __init__.py              # from deepnetz import Model
├── cli.py                   # deepnetz run/serve/info/hardware/download
├── server.py                # OpenAI-compatible FastAPI server
└── engine/
    ├── model.py             # Main Model class
    ├── backend.py           # llama-cpp-python wrapper
    ├── hardware.py          # GPU/CPU/RAM auto-detection
    ├── planner.py           # Budget → optimal inference plan
    ├── gguf_reader.py       # Fast GGUF metadata extraction
    └── downloader.py        # HuggingFace model download

Roadmap

  • Hardware auto-detection + budget planner
  • GGUF metadata reader
  • llama-cpp-python inference backend
  • CLI tool (deepnetz run/info/serve/hardware/download)
  • CPU + GPU + hybrid mode
  • Interactive chat + single prompt + streaming
  • OpenAI-compatible API server
  • Model downloader with auto quant selection
  • TurboQuant KV cache compression (turboquant-ggml)
  • Token eviction (attention sinks + scoring)
  • KV merging (CaM/D2O)
  • Multi-tier adaptive cache
  • Web UI

Author

Keyvan Hardanikeyvan.ai | deepnetz.com | GitHub | LinkedIn

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

deepnetz-1.0.0-cp312-cp312-win_amd64.whl (3.4 MB view details)

Uploaded CPython 3.12Windows x86-64

deepnetz-1.0.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (6.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

deepnetz-1.0.0-cp312-cp312-macosx_10_13_universal2.whl (2.1 MB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

deepnetz-1.0.0-cp311-cp311-win_amd64.whl (3.4 MB view details)

Uploaded CPython 3.11Windows x86-64

deepnetz-1.0.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (5.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

deepnetz-1.0.0-cp311-cp311-macosx_10_9_universal2.whl (2.1 MB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

deepnetz-1.0.0-cp310-cp310-win_amd64.whl (3.4 MB view details)

Uploaded CPython 3.10Windows x86-64

deepnetz-1.0.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (5.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

deepnetz-1.0.0-cp310-cp310-macosx_10_9_universal2.whl (2.1 MB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file deepnetz-1.0.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: deepnetz-1.0.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 3.4 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deepnetz-1.0.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 f3771b91dbaecc3cf40eee5b1087499b447f22651eb0372f91e013f99dcf8608
MD5 4b58554a5f3632c70f315ad6923b0c62
BLAKE2b-256 5c45192f41ed79bee88271b266c575f608605ed0af9dfb67058163940d78be47

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.0-cp312-cp312-win_amd64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 0351a764944b693386c8e8c3684aaaf6e80b015ef4e1981fdfd5c08e2b01f462
MD5 c90646f19097a9be3245125cd8befb4e
BLAKE2b-256 e2cfccb7f3c231eeb32359c0d6cc222a31c077a3b65096d70b72b2f40d84cfb5

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.0-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.0-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 6e457cc60233d94b8dca165ccb97510365be19e75a85c42f991b0661076b5022
MD5 9923b6db53b002664ef6a72f94134237
BLAKE2b-256 da25fc4de9acdda29cebdff9c5099bbb5c8c795ef0f4884ac05bc90bef84fe86

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.0-cp312-cp312-macosx_10_13_universal2.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: deepnetz-1.0.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 3.4 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deepnetz-1.0.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 fa726c8988f165d87288391b20a868a096f016dfcb7a2ed448e61340c00294fc
MD5 0d9f26ee665bc148b78ba371147560ea
BLAKE2b-256 ad696069216613a49b5c12f0bb42418616b8c8fa535d5268581c68f8d780f288

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.0-cp311-cp311-win_amd64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 1ed7bcee9febcc89ec534848f2f8912159ad4d0027800f58437fcc99c8581c8c
MD5 ca37aca858772db92d5224bbc5b4ece9
BLAKE2b-256 360c3c80f38cafdf010c611401bccb20b03a7fd369b128c5249c20b633309bc1

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.0-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.0-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 afded1b56ea13e780ae6f8cac1cbc062056ceb2a41371d017d0e28f139207ea0
MD5 6b8f90eb1cfa3a5a8076570e55929162
BLAKE2b-256 9ae4c66ff9e0d8f7619b4c694338a692d5cd525bd92e27ecae28cb67e2dae8db

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.0-cp311-cp311-macosx_10_9_universal2.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: deepnetz-1.0.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 3.4 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deepnetz-1.0.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 6b51ba253c5ed35bd288d3bc35b6a35d0100c02e8883548baa64356bff4d330a
MD5 96f807d7bdd130a801ed6e3d694a7d4b
BLAKE2b-256 ec77624833790b323be358313f89be1c9737c5c353490d9af9b8621e3dbb62c7

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.0-cp310-cp310-win_amd64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 e3b6df1ba41788dbdb3ee509b3db2481419dba7ff7f550918fe7ef20c8a31730
MD5 d18a339eadff568be679eb8eb6c5e8d2
BLAKE2b-256 91e05396465fd52d22c57ff4d1ff3e87277cbf6f2e5947982ec0bf60c468601d

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.0.0-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for deepnetz-1.0.0-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 11d380752094113987b89be737b8daa04eac105efbd13849cb83f7332c3fa6cb
MD5 e8ba2d756fc21f6a3d353eafc55067fd
BLAKE2b-256 7929eba27c1fde7695d330cde9780983a2745cbdd4297720ea883c8af1969e7d

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.0.0-cp310-cp310-macosx_10_9_universal2.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page