GPU-Accelerated LLM Terminal for Apple Silicon
Project description
Cortex
GPU-accelerated local LLMs on Apple Silicon, built for the terminal.
Cortex is a fast, native CLI for running and fine-tuning LLMs on Apple Silicon using MLX and Metal. It automatically detects chat templates, supports multiple model formats, and keeps your workflow inside the terminal.
Highlights
- Apple Silicon GPU acceleration via MLX (primary) and PyTorch MPS
- Multi-format model support: MLX, GGUF, SafeTensors, PyTorch, GPTQ, AWQ
- Built-in LoRA fine-tuning wizard
- Chat template auto-detection (ChatML, Llama, Alpaca, Gemma, Reasoning)
- Conversation history with autosave and export
Quick Start
pipx install cortex-llm
cortex
Inside Cortex:
/downloadto fetch a model from HuggingFace/modelto load or manage models/statusto confirm GPU acceleration and current settings
Installation
Option A: pipx (recommended)
pipx install cortex-llm
Option B: from source
git clone https://github.com/faisalmumtaz/Cortex.git
cd Cortex
./install.sh
The installer checks Apple Silicon compatibility, creates a venv, installs dependencies from pyproject.toml, and sets up the cortex command.
Requirements
- Apple Silicon Mac (M1/M2/M3/M4)
- macOS 13.3+
- Python 3.11+
- 16GB+ unified memory (24GB+ recommended for larger models)
- Xcode Command Line Tools
Model Support
Cortex supports:
- MLX (recommended)
- GGUF (llama.cpp + Metal)
- SafeTensors
- PyTorch (Transformers + MPS)
- GPTQ / AWQ quantized models
Advanced Features
- Dynamic quantization fallback for PyTorch/SafeTensors models that do not fit GPU memory (INT8 preferred, INT4 fallback)
docs/dynamic-quantization.md
- MLX conversion with quantization recipes (4/5/8-bit, mixed precision) for speed vs quality control
docs/mlx-acceleration.md
- LoRA fine-tuning wizard for local adapters (
/finetune)docs/fine-tuning.md
- Template registry and auto-detection for chat formatting (ChatML, Llama, Alpaca, Gemma, Reasoning)
docs/template-registry.md
- Inference engine details and backend behavior
docs/inference-engine.md
- Tooling (experimental, WIP) for repo-scoped read/search and optional file edits with explicit confirmation
docs/cli.md
Important (Work in Progress): Tooling is actively evolving and should be considered experimental. Behavior, output format, and available actions may change; tool calls can fail; and UI presentation may be adjusted. Use tooling on non-critical work first, and always review any proposed file changes before approving them.
Configuration
Cortex reads config.yaml from the current working directory. For tuning GPU memory limits, quantization defaults, and inference parameters, see:
docs/configuration.md
Documentation
Start here:
docs/installation.mddocs/cli.mddocs/model-management.mddocs/troubleshooting.md
Advanced topics:
docs/mlx-acceleration.mddocs/inference-engine.mddocs/dynamic-quantization.mddocs/template-registry.mddocs/fine-tuning.mddocs/development.md
Contributing
Contributions are welcome. See docs/development.md for setup and workflow.
License
MIT License. See LICENSE.
Note: Cortex requires Apple Silicon. Intel Macs are not supported.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cortex_llm-1.0.10.tar.gz.
File metadata
- Download URL: cortex_llm-1.0.10.tar.gz
- Upload date:
- Size: 159.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ef365cc6cf2c7f9f31db09a1acaf5b97c162069d3768d7728af7b655601fb29
|
|
| MD5 |
534ef61bd82e4ce54ca71ab8f342976c
|
|
| BLAKE2b-256 |
ad31413e391802442439d61d9fe49f650a3824c71b0f4c5e19fb05ca3f5dc0e6
|
File details
Details for the file cortex_llm-1.0.10-py3-none-any.whl.
File metadata
- Download URL: cortex_llm-1.0.10-py3-none-any.whl
- Upload date:
- Size: 173.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
afde10e8c9702fe5e95b1b867e762de9ac7e9f61be0ce5534ddfe3c23ed35b66
|
|
| MD5 |
e1be1bc430232c8cb9af1eda92754c09
|
|
| BLAKE2b-256 |
a5b5da736f05d66e9e2d4bd69dd20794462c95bdb0f58280e409b070312bd8b7
|