Skip to main content

Run massive models on minimal hardware

Project description

DeepNetz Logo

DeepNetz

Run massive models on minimal hardware.

PyPI License Tests

pip install deepnetz

deepnetz pull Qwen3.5-35B                       # download from HuggingFace
deepnetz run Qwen3.5-35B                        # auto-detect hardware, run
deepnetz serve Qwen3.5-35B --port 8080          # OpenAI-compatible API + Web UI

Web App: deepnetz.com/app | Docs: deepnetz.com

What it does

One framework. 6 backends. Any model. Any hardware.

You have Typical setup With DeepNetz
RTX 4060 8GB + 32GB RAM 35B model via Ollama Same model, 3.6x less KV cache, longer context
32GB RAM, no GPU 7B model, slow Auto-optimized CPU inference + KV compression
RTX 3090 24GB + 64GB RAM 70B model Optimized layer split + cache

Quick Start

pip install deepnetz

# Search & download models
deepnetz search Qwen                            # search HuggingFace
deepnetz pull Qwen3.5-35B                       # download best quant for your hardware
deepnetz pull Qwen3.5-35B --quant Q8_0          # specific quantization
deepnetz list                                    # show local models

# Run
deepnetz run Qwen3.5-35B                        # from local store
deepnetz run ./model.gguf                        # local file
deepnetz run ollama://qwen3.5:35b               # from Ollama
deepnetz run hf://unsloth/Qwen3.5-35B-A3B-GGUF  # from HuggingFace
deepnetz run lmstudio://qwen3.5-35b             # from LM Studio

# Options
deepnetz run model.gguf --cpu                    # CPU-only
deepnetz run model.gguf --gpu 8GB --context 32k  # GPU budget + context
deepnetz run model.gguf -p "Explain gravity"     # single prompt

# API server + Web UI
deepnetz serve model.gguf --port 8080
# Web UI:  https://deepnetz.com/app  (connects to localhost)
# API:     http://localhost:8080/v1/chat/completions
# Docs:    http://localhost:8080/docs

# Hardware info
deepnetz hardware
deepnetz backends

Registry

DeepNetz has its own model registry at registry.deepnetz.com. Search and pull any GGUF model from HuggingFace through our server.

# Register & login (one time)
deepnetz register
deepnetz login

# Search models (via registry server → HuggingFace)
deepnetz search Qwen
deepnetz search "code llama"
deepnetz search deepseek

# Pull (auto-selects best quant for your hardware)
deepnetz pull Qwen3.5-35B
deepnetz pull Llama-3.3-70B --quant IQ2_M
deepnetz pull unsloth/Qwen3.5-35B-A3B-GGUF      # direct HF repo

Models are stored in ~/.cache/deepnetz/registry/blobs/ as content-addressed files.

Python API

from deepnetz import Model

# Auto everything
model = Model("model.gguf")
response = model.chat("Hello!")

# Streaming
for token in model.stream("Tell me a story"):
    print(token, end="", flush=True)

# Specific backend
model = Model("ollama://qwen3.5:35b")

# CPU-only with budget
model = Model("model.gguf", cpu_only=True, target_context=8192)

6 Backends

DeepNetz auto-detects which backends are installed and uses the best one:

Backend Source How it connects
Native llama-cpp-python Direct GGUF inference (fastest)
Ollama Ollama REST API localhost:11434
vLLM vLLM server vllm serve
LM Studio LM Studio API localhost:1234
HuggingFace transformers Pipeline (safetensors)
Remote Any OpenAI API Custom endpoint

KV Cache Optimization

Up to 10x memory reduction through stacked compression:

Technique Based on Effect
TurboQuant Google, ICLR 2026 3.6x KV compression
Attention Sinks StreamingLLM Fixed memory for infinite context
Token Eviction PagedEviction Remove unimportant tokens
KV Merging CaM / D2O Merge similar tokens

Web UI

Start the server and open deepnetz.com/app — it connects to your local instance:

deepnetz serve model.gguf --port 8080
# Open https://deepnetz.com/app → Connect to localhost:8080

Features: Chat with streaming, vision (image upload), reasoning mode, model search & pull, model switching, system monitor, settings.

Or use the built-in UI at http://localhost:8080/chat.

Vision & Multimodal

Send images to vision models (Gemma 4, Qwen-VL, LLaVA):

deepnetz run qwen3-vl:8b --image photo.jpg -p "What's in this image?"
deepnetz run qwen3-vl:8b   # interactive: use /image path.jpg

Reasoning Mode

Enable step-by-step reasoning (DeepSeek-R1, QwQ):

deepnetz run deepseek-r1:14b --reasoning -p "Solve: 2x + 5 = 13"

Speculative Decoding

Use a small draft model for 1.5-2x faster generation:

deepnetz run Qwen3.5-35B --draft Llama-3.2-3B

Model Optimizer

Analyze models and get optimization recommendations:

deepnetz optimize model.gguf          # Analysis + recommendations
deepnetz optimize --install-ik-llama  # 1.3-1.5x faster CUDA kernels
deepnetz convert hf://user/repo --quant Q4_K_M   # HF → GGUF

Benchmarks

Tested on RTX 4060 (8GB) + 32GB RAM:

Model PPL Delta Speed KV Compression
Llama-3.2-3B +0.4% 3.6x
Gemma-3-27B +2.0% 2.3 tok/s 3.6x
Qwen3.5-35B +2.7% 7.4 tok/s 3.6x
Llama-3.3-70B 0.7 tok/s
Qwen3.5-122B 1.3 tok/s

Architecture

deepnetz/
├── cli.py                       # CLI (run/serve/pull/search/list/register/login)
├── server.py                    # FastAPI + OpenAI API + WebSocket
├── engine/
│   ├── model.py                 # Main orchestrator
│   ├── manager.py               # Model lifecycle (load/unload/switch)
│   ├── hardware.py              # GPU/CPU/RAM detection
│   ├── monitor.py               # Real-time system stats
│   ├── planner.py               # Budget → inference plan
│   ├── session.py               # SQLite conversation persistence
│   ├── resolver.py              # Universal model resolver (8 sources)
│   ├── downloader.py            # Model download wrapper
│   ├── features.py              # Vision, Reasoning, Tool Calling, MoE detection
│   ├── speculative.py           # Token-level speculative decoding
│   ├── optimize.py              # Model analysis + optimization recommendations
│   ├── converter.py             # HF → GGUF converter
│   ├── gguf_reader.py           # GGUF metadata extraction
│   ├── scanner.py               # Local model discovery
│   └── evaluator.py             # Output quality scoring
├── registry/
│   ├── store.py                 # Local blob store + HF pull
│   ├── client.py                # Registry server client (auth, search)
│   ├── server.py                # Registry server (deploy on your infra)
│   └── config.py                # Model config format
├── backends/                    # 6 pluggable adapters
│   ├── native.py, ollama.py, vllm.py
│   ├── lmstudio.py, huggingface.py, remote.py
│   └── discovery.py             # Auto-detect backends
├── cache/                       # KV cache optimization
│   ├── turboquant.py, eviction.py, merging.py
├── tools/                       # Tool calling
│   ├── search.py, registry.py, base.py
└── ui/                          # Web UI templates

Comparison

Feature Ollama LM Studio vLLM DeepNetz
Load from anywhere Own registry Own catalog HuggingFace All of them
KV Cache Compression No No No q4_0/q8_0 (3.6x)
Multi-Backend No No No 6 backends
Hardware Auto-Tuning Basic Basic No Budget planner
Vision/Multimodal No Yes No Yes (API + UI)
Reasoning Mode No No No Yes (think tags)
Speculative Decoding No Experimental No Token-level
Model Optimizer No No No APEX + ik_llama
MoE Detection No No No APEX recommendations
Web UI No Yes (closed) No Yes (hosted + local)
Model Registry Proprietary No No Own + HuggingFace
OAuth Login No No No GitHub + Google
Tool Calling No No Yes Yes + Web Search

Contributing

git clone https://github.com/Keyvanhardani/deepnetz.git
cd deepnetz
pip install -e ".[server]"
pytest tests/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

deepnetz-1.1.0-cp312-cp312-win_amd64.whl (4.6 MB view details)

Uploaded CPython 3.12Windows x86-64

deepnetz-1.1.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (12.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

deepnetz-1.1.0-cp312-cp312-macosx_10_13_universal2.whl (6.3 MB view details)

Uploaded CPython 3.12macOS 10.13+ universal2 (ARM64, x86-64)

deepnetz-1.1.0-cp311-cp311-win_amd64.whl (4.7 MB view details)

Uploaded CPython 3.11Windows x86-64

deepnetz-1.1.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (11.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

deepnetz-1.1.0-cp311-cp311-macosx_10_9_universal2.whl (6.3 MB view details)

Uploaded CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)

deepnetz-1.1.0-cp310-cp310-win_amd64.whl (4.7 MB view details)

Uploaded CPython 3.10Windows x86-64

deepnetz-1.1.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (11.2 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

deepnetz-1.1.0-cp310-cp310-macosx_10_9_universal2.whl (6.3 MB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file deepnetz-1.1.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: deepnetz-1.1.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 4.6 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deepnetz-1.1.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 94ee4dd3a03136112f777679e0541aa09c6e693234eca74336b8d9e5080fb292
MD5 f4bdd78d4ec84eecb4870006393f81ba
BLAKE2b-256 a578f17e337cf0fa26b471609ca96178bcd7abfe06a5eca0fa0c9a0cad526bd1

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.1.0-cp312-cp312-win_amd64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.1.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for deepnetz-1.1.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 7a973a57d3698fbbccb88322fac6e21d94cc9f45a035e3f4bee5214ef2c70470
MD5 c5788f60bbae4765f3ea21b6774e96f0
BLAKE2b-256 980f1609f3c456e4427569fa6d624fd4dbb3af9b34cc620dedf4861c8e04310c

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.1.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.1.0-cp312-cp312-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for deepnetz-1.1.0-cp312-cp312-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 f0e194347c75c2a5d37710eae165a30942a4595cdd7d71c87d77b9b32edf3093
MD5 5416bad138eefdf98a133b99115c0ab8
BLAKE2b-256 eec7bbbf170b64cc6e2d0749b05959c99582c74fddd780e9718fb76218909572

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.1.0-cp312-cp312-macosx_10_13_universal2.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.1.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: deepnetz-1.1.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 4.7 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deepnetz-1.1.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 bfe8f5f6d8b939cf0de04a0cb1533ab5374e932375a56be7968cbfc26ee6dcb2
MD5 4383ab7475b4ad9af2b8757efaa6dd97
BLAKE2b-256 b6609215ff0783ac8ea08306526e32ebee9c87e72f50695c873b9935707e4e74

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.1.0-cp311-cp311-win_amd64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.1.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for deepnetz-1.1.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 a67bd276fa87016d9b2f2c254ebfc7e6b4c51652b72c556b36a4924e4fde65bf
MD5 3406ac0c00a3594d223716357ff26f55
BLAKE2b-256 cea999cf710765e17e0b4098acd565e49253a1c96e9ed1eff6bf38edfd270b62

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.1.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.1.0-cp311-cp311-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for deepnetz-1.1.0-cp311-cp311-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 c8799756e50b16dd6dfd1e42dbea12c8f73c3b784adaff2ebbcb2363554819c9
MD5 099ae1d7d7ee6c68f7c923b784bbe4f3
BLAKE2b-256 ca24124b416b2bdc533dbdf48a2c206b893f6123075995ff0b2c4dc9616f85c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.1.0-cp311-cp311-macosx_10_9_universal2.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.1.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: deepnetz-1.1.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 4.7 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for deepnetz-1.1.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 b40b35befb9d43fc1d02596c127c5bd73873a10651d98f29804582005a783a64
MD5 5c3b9f817d76e06e217f2e56f8c2d453
BLAKE2b-256 ddcb696919a6602c2690457d0654368ef3b93798e63e5df34e35793c621d28ab

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.1.0-cp310-cp310-win_amd64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.1.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for deepnetz-1.1.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 6948d4882fabf8b02babc7112c04f76275516bdd3cd8e82c815bd8db635fa90c
MD5 ba5c79283476d20357c6e9bef8567971
BLAKE2b-256 1fbb3af49dd607d27aec643eb9d72879c887eb5257f97aeef5adc0be60c030f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.1.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file deepnetz-1.1.0-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for deepnetz-1.1.0-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 c72f9894b38a1c382c88badfc4c2121dcad213fe9bc81cc0ad01eba9f9ce90ca
MD5 de0143aeb3a0084ab77bc078ca5ca5a1
BLAKE2b-256 75d2b50def4b6607d756c01fc0a4f481c51437d19e9cc84ffcad2b7994d471f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for deepnetz-1.1.0-cp310-cp310-macosx_10_9_universal2.whl:

Publisher: workflow.yml on Keyvanhardani/deepnetz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page