Offline AI coding assistant for Apple Silicon. Run LLMs locally with OpenAI-compatible API.
Project description
local-ai
Run AI models locally on your Mac with zero cloud dependencies.
local-ai brings the power of large language models to your Apple Silicon Mac, completely offline. No API keys, no usage limits, no data leaving your machine.
Why local-ai?
The Problem
- Privacy concerns: Cloud AI services see all your code, prompts, and data
- API costs: Pay-per-token pricing adds up quickly for heavy usage
- Rate limits: Cloud providers throttle requests during peak times
- Internet dependency: No connection = no AI assistance
- Latency: Round-trip to cloud servers adds delay to every interaction
The Solution
local-ai runs models directly on your Mac's GPU using Apple's MLX framework:
- 100% Private: Your data never leaves your machine
- Zero Cost: No API fees, subscriptions, or usage limits
- Always Available: Works offline, on planes, in secure environments
- Low Latency: Direct GPU inference, no network round-trips
- OpenAI Compatible: Works with existing tools that support OpenAI's API
Features
- One-Command Server: Start a local LLM server with
local-ai server start - OpenAI-Compatible API: Drop-in replacement for OpenAI clients
- Model Browser: Discover and download optimized MLX models from Hugging Face
- Hardware Detection: Automatically detects your Mac's capabilities
- Smart Recommendations: Suggests models that fit your available memory
- Web Interface: Built-in chat UI for testing models at
http://localhost:8080 - Tool Calling: Function calling support for agentic workflows
- Rich CLI: Beautiful terminal output with progress bars and status panels
Quick Start
Installation from GitHub
# Clone the repository
git clone https://github.com/tumma72/local-ai.git
cd local-ai
# Install with uv (recommended)
uv sync
# Or with pip
pip install -e .
Basic Usage
# Start the server (models load dynamically)
local-ai server start
# Open http://localhost:8080 in your browser for the web UI
# Or use with any OpenAI-compatible client
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "mlx-community/Qwen3-0.6B-4bit",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Discover Models
# See recommended models for your hardware
local-ai models recommend
# Search for specific models
local-ai models search "llama 8b"
# Get detailed model info
local-ai models info mlx-community/Llama-3.2-3B-Instruct-4bit
Server Management
# Check server status
local-ai server status
# View server logs
local-ai server logs --follow
# Restart with new settings
local-ai server restart --port 9000
# Stop the server
local-ai server stop
Configuration
Create a config.toml file for persistent settings:
[server]
host = "127.0.0.1"
port = 8080
log_level = "INFO"
[model]
# Default model (optional - models load dynamically)
path = "mlx-community/Qwen3-0.6B-4bit"
Use with CLI:
local-ai server start --config config.toml
Requirements
- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.11+
- 8GB+ RAM recommended (16GB+ for larger models)
Use Cases
IDE Integration
local-ai works with any IDE that supports OpenAI-compatible endpoints:
- VS Code with Continue extension
- Cursor (set custom API endpoint)
- Zed editor (configure assistant)
- JetBrains IDEs with AI plugins
Claude Code / Aider / Other Tools
# Set environment variables
export OPENAI_API_BASE=http://localhost:8080/v1
export OPENAI_API_KEY=not-needed
# Use your favorite AI coding tool
aider --model mlx-community/Qwen3-0.6B-4bit
Python Integration
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed"
)
response = client.chat.completions.create(
model="mlx-community/Qwen3-0.6B-4bit",
messages=[{"role": "user", "content": "Explain Python decorators"}]
)
print(response.choices[0].message.content)
Development
# Install development dependencies
uv sync
# Run tests
uv run pytest
# Run with coverage
uv run pytest --cov=local_ai
# Type checking
uv run mypy src/
# Linting
uv run ruff check src/
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
License
Apache License 2.0 - see LICENSE.md for details.
Acknowledgments
- MLX - Apple's machine learning framework
- mlx-omni-server - OpenAI-compatible server (with local patches)
- Hugging Face - Model hosting and community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file local_ai_server-0.2.0a0.tar.gz.
File metadata
- Download URL: local_ai_server-0.2.0a0.tar.gz
- Upload date:
- Size: 129.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2eda382c6b53006bb9bce7c2bed6a4315fc17252da56daa966c19235002ecdf
|
|
| MD5 |
9739b38f792b13318efd0d96461126bc
|
|
| BLAKE2b-256 |
fa9b6a1cee47f378cd394f62ac3b56a34b759e297980ab8ae21cfa7501f4e7ae
|
Provenance
The following attestation bundles were made for local_ai_server-0.2.0a0.tar.gz:
Publisher:
release-and-publish.yml on tumma72/local-ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
local_ai_server-0.2.0a0.tar.gz -
Subject digest:
b2eda382c6b53006bb9bce7c2bed6a4315fc17252da56daa966c19235002ecdf - Sigstore transparency entry: 771489449
- Sigstore integration time:
-
Permalink:
tumma72/local-ai@444b87fd3fdea1cbfe34d44b1800c7f5ff88643d -
Branch / Tag:
refs/tags/v0.2.0a0 - Owner: https://github.com/tumma72
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-and-publish.yml@444b87fd3fdea1cbfe34d44b1800c7f5ff88643d -
Trigger Event:
push
-
Statement type:
File details
Details for the file local_ai_server-0.2.0a0-py3-none-any.whl.
File metadata
- Download URL: local_ai_server-0.2.0a0-py3-none-any.whl
- Upload date:
- Size: 90.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4ea1678f423ed9dc642f4e727cc0c08b59d6ef179bef9ebfa561fc5d168318d
|
|
| MD5 |
6ec4a8b61afb89ee4f14d94663b7d250
|
|
| BLAKE2b-256 |
a48f27eb213000484286275b6b9905f6739885a7206e8f37c0e1a188e272cf52
|
Provenance
The following attestation bundles were made for local_ai_server-0.2.0a0-py3-none-any.whl:
Publisher:
release-and-publish.yml on tumma72/local-ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
local_ai_server-0.2.0a0-py3-none-any.whl -
Subject digest:
a4ea1678f423ed9dc642f4e727cc0c08b59d6ef179bef9ebfa561fc5d168318d - Sigstore transparency entry: 771489450
- Sigstore integration time:
-
Permalink:
tumma72/local-ai@444b87fd3fdea1cbfe34d44b1800c7f5ff88643d -
Branch / Tag:
refs/tags/v0.2.0a0 - Owner: https://github.com/tumma72
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-and-publish.yml@444b87fd3fdea1cbfe34d44b1800c7f5ff88643d -
Trigger Event:
push
-
Statement type: