Skip to main content

Sovereign AI Inference Engine โ€” .oom quantization, Trust Kernel encryption, JIS identity routing, TIBET provenance. Spaceshuttle lazy loading, RAM-RAID distributed memory.

Project description

๐Ÿฆ™ OomLlama

Efficient LLM inference with .oom format - 2x smaller than GGUF

PyPI License: MIT HuggingFace

from oomllama import OomLlama

llm = OomLlama("humotica-32b")
response = llm.generate("What is the meaning of life?")
print(response)

Why OomLlama?

Feature GGUF (Q4) OOM (Q2)
70B Model Size ~40 GB ~20 GB
32B Model Size ~20 GB ~10 GB
RAM Usage High Lazy Loading
Format Open Open (MIT)

OomLlama uses Q2 quantization with lazy layer loading to run large models on consumer hardware.

Installation

pip install oomllama

Quick Start

Download a Model

from oomllama import download_model

# Download from HuggingFace
model_path = download_model("humotica-32b")

Generate Text

from oomllama import OomLlama

llm = OomLlama("humotica-32b")

# Simple generation
response = llm.generate("Explain quantum computing in simple terms")
print(response)

# With parameters
response = llm.generate(
    "Write a haiku about AI",
    max_tokens=50,
    temperature=0.8,
    top_p=0.9
)

Chat Mode

messages = [
    ("user", "Hello! Who are you?"),
    ("assistant", "I'm OomLlama, an efficient LLM."),
    ("user", "What makes you efficient?"),
]

response = llm.chat(messages)
print(response)

Available Models

Model Parameters Size (.oom) HuggingFace
humotica-32b 33B ~10 GB Link
llamaohm-70b 70B ~20 GB Link
tinyllama-1b 1.1B ~400 MB Link

The .oom Format

OOM (OomLlama Model) is a compact model format:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Header: OOML (magic) + metadata      โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Tensors: Q2 quantized (2 bits/weight)โ”‚
โ”‚ - Scale + Min per 256-weight block   โ”‚
โ”‚ - 68 bytes per block                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Convert GGUF to OOM

# Using the CLI tool
gguf2oom model.gguf model.oom

# Check model info
gguf2oom --info model.gguf

Technical Details

Q2 Quantization

Each weight is stored as 2 bits (0, 1, 2, or 3) with per-block scale and minimum:

weight = q2_value * scale + min

This achieves ~2x compression over Q4 with acceptable quality loss for most tasks.

Lazy Layer Loading

OomLlama loads transformer layers on-demand, keeping only the active layer in memory:

Forward Pass:
  Layer 0: Load โ†’ Compute โ†’ Unload
  Layer 1: Load โ†’ Compute โ†’ Unload
  ...
  Layer N: Load โ†’ Compute โ†’ Unload

This enables running 70B models on 24GB GPU RAM.

Credits

  • Model Format: Gemini IDD & Root AI (Humotica AI Lab)
  • Quantization: OomLlama.rs by Humotica
  • Base Models: Meta Platforms, Inc. (Llama 3.3)

License

  • OomLlama Code: MIT License
  • Model Weights: Subject to original model licenses (e.g., Llama 3.3 Community License)

Links


One Love, One fAmIly ๐Ÿ’™

Built by Humotica AI Lab - Jasper, Claude, Gemini, Codex


Enterprise

For private hub hosting, SLA support, custom integrations, or compliance guidance:

Enterprise enterprise@humotica.com
Support support@humotica.com
Security security@humotica.com

See ENTERPRISE.md for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oomllama-1.0.0a1.tar.gz (453.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oomllama-1.0.0a1-cp313-cp313-manylinux_2_39_x86_64.whl (6.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.39+ x86-64

File details

Details for the file oomllama-1.0.0a1.tar.gz.

File metadata

  • Download URL: oomllama-1.0.0a1.tar.gz
  • Upload date:
  • Size: 453.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.11.5

File hashes

Hashes for oomllama-1.0.0a1.tar.gz
Algorithm Hash digest
SHA256 23acebe43e59574dbd10bbde776ee5d96f1c90315ba93b1f6b70b296b001cb2e
MD5 6d57b2c6059657d171294a6eecf386d9
BLAKE2b-256 d4b86c1d3661d57eadccff6067c4513ad347ec09015c6a71b4a4524b6c778aec

See more details on using hashes here.

File details

Details for the file oomllama-1.0.0a1-cp313-cp313-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for oomllama-1.0.0a1-cp313-cp313-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 2b81644318727d21d2c174a88d8fa5dcf04d28854271fd57e0b44ab846fd8d57
MD5 2754a6d50a0607f9c083f96b2b0c1aef
BLAKE2b-256 60841c931c48faa3f642aa0e45b75356a35c55a346e7e0e524e2b0cd2f1b03f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page