Skip to main content

GPU-Accelerated LLM Terminal for Apple Silicon

Project description

Cortex

GPU-accelerated local LLMs on Apple Silicon, built for the terminal.

Cortex preview

Cortex is a fast, native CLI for running and fine-tuning LLMs on Apple Silicon using MLX and Metal. It automatically detects chat templates, supports multiple model formats, and keeps your workflow inside the terminal.

Highlights

  • Apple Silicon GPU acceleration via MLX (primary) and PyTorch MPS
  • Multi-format model support: MLX, GGUF, SafeTensors, PyTorch, GPTQ, AWQ
  • Built-in LoRA fine-tuning wizard
  • Chat template auto-detection (ChatML, Llama, Alpaca, Gemma, Reasoning)
  • Conversation history with autosave and export

Quick Start

pipx install cortex-llm
cortex

Inside Cortex:

  • /download to fetch a model from HuggingFace
  • /model to load or manage models
  • /status to confirm GPU acceleration and current settings

Installation

Option A: pipx (recommended)

pipx install cortex-llm

Option B: from source

git clone https://github.com/faisalmumtaz/Cortex.git
cd Cortex
./install.sh

The installer checks Apple Silicon compatibility, creates a venv, installs dependencies from pyproject.toml, and sets up the cortex command.

Requirements

  • Apple Silicon Mac (M1/M2/M3/M4)
  • macOS 13.3+
  • Python 3.11+
  • 16GB+ unified memory (24GB+ recommended for larger models)
  • Xcode Command Line Tools

Model Support

Cortex supports:

  • MLX (recommended)
  • GGUF (llama.cpp + Metal)
  • SafeTensors
  • PyTorch (Transformers + MPS)
  • GPTQ / AWQ quantized models

Advanced Features

  • Dynamic quantization fallback for PyTorch/SafeTensors models that do not fit GPU memory (INT8 preferred, INT4 fallback)
    • docs/dynamic-quantization.md
  • MLX conversion with quantization recipes (4/5/8-bit, mixed precision) for speed vs quality control
    • docs/mlx-acceleration.md
  • LoRA fine-tuning wizard for local adapters (/finetune)
    • docs/fine-tuning.md
  • Template registry and auto-detection for chat formatting (ChatML, Llama, Alpaca, Gemma, Reasoning)
    • docs/template-registry.md
  • Inference engine details and backend behavior
    • docs/inference-engine.md
  • Tooling (experimental, WIP) for repo-scoped read/search and optional file edits with explicit confirmation
    • docs/cli.md

Important (Work in Progress): Tooling is actively evolving and should be considered experimental. Behavior, output format, and available actions may change; tool calls can fail; and UI presentation may be adjusted. Use tooling on non-critical work first, and always review any proposed file changes before approving them.

Configuration

Cortex reads config.yaml from the current working directory. For tuning GPU memory limits, quantization defaults, and inference parameters, see:

  • docs/configuration.md

Documentation

Start here:

  • docs/installation.md
  • docs/cli.md
  • docs/model-management.md
  • docs/troubleshooting.md

Advanced topics:

  • docs/mlx-acceleration.md
  • docs/inference-engine.md
  • docs/dynamic-quantization.md
  • docs/template-registry.md
  • docs/fine-tuning.md
  • docs/development.md

Contributing

Contributions are welcome. See docs/development.md for setup and workflow.

License

MIT License. See LICENSE.


Note: Cortex requires Apple Silicon. Intel Macs are not supported.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cortex_llm-1.0.12.tar.gz (162.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cortex_llm-1.0.12-py3-none-any.whl (182.3 kB view details)

Uploaded Python 3

File details

Details for the file cortex_llm-1.0.12.tar.gz.

File metadata

  • Download URL: cortex_llm-1.0.12.tar.gz
  • Upload date:
  • Size: 162.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for cortex_llm-1.0.12.tar.gz
Algorithm Hash digest
SHA256 9a8d698714295132df475f2fda53c4e3e3dc51b660fcf477c4409218baa140e4
MD5 53e55ff454f0cf42ccde18b934c0d78c
BLAKE2b-256 6dacea0ebc3effacbbca0804b9e772293ef0fda645a0583dd3d6a1c0d964adb9

See more details on using hashes here.

File details

Details for the file cortex_llm-1.0.12-py3-none-any.whl.

File metadata

  • Download URL: cortex_llm-1.0.12-py3-none-any.whl
  • Upload date:
  • Size: 182.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for cortex_llm-1.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 04d14664f6b1ec2a8e12414bda17f51339f6c2ab3db80fb67e3d7647d89608d9
MD5 aee49ae734a3e928f1c210284a9a947a
BLAKE2b-256 7f18abe15bcbc6f1ed8e7a636c0a98f0043272fc7005849aa3ff361ca80077a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page