Skip to main content

AlienSky optimized inference engine for Qwen on Apple Silicon

Project description

AlienSkyQwen

10x longer conversations. Better accuracy. On your Mac.

AlienSkyQwen brings enterprise-grade inference optimization to your Mac. Built on the same AlienSky technology licensed to hyperscalers and inference providers, it reduces the memory needed for conversation history by up to 16x while improving reasoning accuracy by up to +7.8 percentage points — no model changes, no quality trade-offs.

AlienSky improves reasoning accuracy on every model tested. Perplexity improves or shows near-zero impact — the optimized model predicts text as well or better than baseline.

AlienSky.ai  |  @AlienSkyAI

KV Cache Memory — 16x Reduction

Reasoning Accuracy — AlienSky Improves Every Model

Perplexity — Minimal Impact, Often Improves

Quick Start

pip install alienskyqwen
from alienskyqwen import aliensky_load
import mlx_lm

# Load any supported Qwen model — AlienSky optimizes it automatically
model, tokenizer = aliensky_load("mlx-community/Qwen3.5-27B-4bit")

# Use it exactly like normal
response = mlx_lm.generate(model, tokenizer, "Explain quantum computing", max_tokens=500)
print(response)

Or as an OpenAI-compatible server:

alienskyqwen-serve --model mlx-community/Qwen3.5-27B-4bit --port 8080

Connect any client (Open WebUI, chatbot-ui, curl) to http://localhost:8080/v1.

Optimization Profiles

Profile KV Compression Best For
aliensky_v1_std 9.85x General use with dense models (default)
aliensky_v1_fast 16x Maximum memory savings, long contexts
aliensky_v1_quality 6.56x Peak accuracy on dense models
aliensky_v1_balance 10.67x MoE models (recommended for 35B-A3B, 122B)
model, tokenizer = aliensky_load("mlx-community/Qwen3.6-35B-A3B-4bit",
                                  profile="aliensky_v1_balance")

Supported Models

Model Model Size (Q4) Recommended Profile
Qwen3.6-35B-A3B ~18 GB aliensky_v1_balance
Qwen3.5-27B ~14 GB aliensky_v1_std
Qwen3.5-27B-Claude-Distilled ~15 GB aliensky_v1_std
Qwen3.5-122B-A10B ~68 GB aliensky_v1_balance
Qwen3.5-9B ~5 GB aliensky_v1_std

Results at a Glance

All benchmarks run on Mac Studio M3 Ultra (512 GB). Full methodology and per-context breakdowns available in BENCHMARKS.

Accuracy

Model Profile ARC-Challenge MMLU (5-shot) HellaSwag Perplexity (C4)
Qwen3.6-35B-A3B (Q4) v1_balance +0.34pp +0.20pp -0.50pp +2.1%
Qwen3.5-27B (Q4) v1_std +7.76pp -2.00pp -1.90pp -4.3%
Qwen3.5-27B-Claude-Distilled (Q4) v1_quality +2.22pp -3.5%
Qwen3.5-27B (8-bit) v1_quality +2.82pp -1.00pp -1.30pp +0.5%
Qwen3.5-122B-A10B (Q4) v1_quality +1.28pp -2.20pp -1.45pp +1.7%
Qwen3.5-9B (Q4) v1_std +1.54pp -2.25pp

AlienSky improves ARC-Challenge reasoning accuracy on every model tested. Perplexity improves on both 27B variants (lower is better) — the optimized model predicts text more accurately than baseline. The 8-bit and MoE models show near-zero PPL impact.

Memory Savings (KV Cache)

Context Length Baseline (27B) AlienSky (v1_std, 9.85x) AlienSky (v1_fast, 16x)
32K tokens 2.0 GB 208 MB 128 MB
131K tokens 8.0 GB 832 MB 512 MB
262K tokens 16.0 GB 1.6 GB 1.0 GB

MoE models (35B-A3B, 122B-A10B) achieve 10.67x savings with the v1_balance profile.

Decode Speed

Context Baseline AlienSky (v1_std) Ratio
4K tokens 103.7 tok/s 88.6 tok/s 0.85x
8K tokens 99.3 tok/s 85.8 tok/s 0.86x
32K tokens 87.1 tok/s 72.0 tok/s 0.83x

On Apple Silicon, decode speed is bottlenecked by model weight loading, not the KV cache. AlienSky's overhead is minimal in practice.

How It Works

When an LLM generates text, it stores the entire conversation in a structure called the KV cache. This cache grows with every token and can consume gigabytes of RAM.

AlienSky optimizes this at multiple levels:

  1. Compresses conversation memory to a fraction of its original size
  2. Computes attention directly on the compressed form — no decompression needed, up to 5x faster attention at long contexts
  3. Runs on Apple Metal via a custom GPU kernel for maximum speed

The result: 10-16x less memory, faster attention, and in many cases better answers — the optimization acts as noise reduction for the model's internal state.

What this unlocks: With 16x less memory per conversation, you can run multiple conversations in parallel, keep several models loaded simultaneously, or handle book-length contexts that would otherwise exhaust your Mac's RAM.

Requirements

  • Apple Silicon Mac (M1 or newer)
  • macOS 14.0+
  • Python 3.11-3.14
  • 16 GB+ unified memory (64 GB+ recommended for 27B models)

API Reference

aliensky_load(model, profile, data_path)

Load a model with AlienSky optimization.

  • model — HuggingFace model ID or local path
  • profile — Optimization profile (default: "aliensky_v1_std")
  • data_path — Path to AlienSky data file (auto-detected by default)

Returns (model, tokenizer) for use with mlx_lm.generate() or mlx_lm.server.

alienskyqwen-serve

alienskyqwen-serve [--model MODEL] [--profile PROFILE] [--port PORT] [--host HOST]

OpenAI-compatible API server. Default model: Qwen3.5-27B-4bit. Default port: 8080.

FAQ

Does AlienSky modify the model weights? No. It only optimizes how the model stores and retrieves conversation memory.

Why does reasoning accuracy improve? The compression acts as regularization, filtering noise in the model's internal representations. This effect is strongest on reasoning benchmarks and distilled models.

Can I use this with LM Studio? Not directly. Run alienskyqwen-serve and point your chat UI to http://localhost:8080/v1.

What about Llama, Gemma, or other models? AlienSky currently supports Qwen3.5 and Qwen3.6. Additional architectures are planned.

License

Free for personal use. Commercial use requires a license. No modifications permitted. See LICENSE for full terms.

Copyright (c) 2026 AlienSky LLC.

"Qwen" is a trademark of Alibaba Group. AlienSky LLC is not affiliated with, endorsed by, or sponsored by Alibaba Group.


AlienSky.ai  |  @AlienSkyAI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

alienskyqwen-0.1.1-cp314-cp314-macosx_14_0_arm64.whl (74.4 MB view details)

Uploaded CPython 3.14macOS 14.0+ ARM64

alienskyqwen-0.1.1-cp313-cp313-macosx_14_0_arm64.whl (74.4 MB view details)

Uploaded CPython 3.13macOS 14.0+ ARM64

alienskyqwen-0.1.1-cp312-cp312-macosx_14_0_arm64.whl (74.4 MB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

alienskyqwen-0.1.1-cp311-cp311-macosx_14_0_arm64.whl (74.4 MB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

File details

Details for the file alienskyqwen-0.1.1-cp314-cp314-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for alienskyqwen-0.1.1-cp314-cp314-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 6e45c1cdd25d4bb0a480b4e7fe9e12fc5a9af6decf35ada4339e3d49551e1009
MD5 c2bd3286060c6a64c4ed27111880fb3d
BLAKE2b-256 6670467e1ee8784093a3be7383757a7a74ff88e91f948c3dcf12ae46ae056e03

See more details on using hashes here.

File details

Details for the file alienskyqwen-0.1.1-cp313-cp313-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for alienskyqwen-0.1.1-cp313-cp313-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 0c7b2a2f24ebc0476f68c89078938fc58fec4fabce453be7c3d403483064131c
MD5 faa79d91f74eafc7d79185b9a17cd600
BLAKE2b-256 8d82d21131b50842e49df1aa9a8f45648f4b3458e26edd7629d11537336ee8cf

See more details on using hashes here.

File details

Details for the file alienskyqwen-0.1.1-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for alienskyqwen-0.1.1-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 628a4509200a9acd0e879abf221dfd9d4cf54ed303cb7a7f2599ef5feeb1e693
MD5 482340da81d5f1c9634eca3d8aa4c218
BLAKE2b-256 074988f2c52b2e38cdf63003a4b64ba185e5222176ef6c52716bf7f6f32003c7

See more details on using hashes here.

File details

Details for the file alienskyqwen-0.1.1-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for alienskyqwen-0.1.1-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 a0f8d8a15c2c4ff9f1b8a9fbf22ac24bc6097fc8313a4112be56c6b9d54260d6
MD5 93cb95b3f2276ac83f9cd6d9625901a6
BLAKE2b-256 b00fc618ecb44e8ebe0ed4ccdffbd72e735f81e0dff3f57c6485cb6ba4352629

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page