Skip to main content

Offline AI coding assistant for Apple Silicon. Run LLMs locally with OpenAI-compatible API.

Project description

local-ai

Tests Coverage Python 3.11+ License Release

Run AI models locally on your Mac with zero cloud dependencies.

local-ai brings the power of large language models to your Apple Silicon Mac, completely offline. No API keys, no usage limits, no data leaving your machine.

Why local-ai?

The Problem

  • Privacy concerns: Cloud AI services see all your code, prompts, and data
  • API costs: Pay-per-token pricing adds up quickly for heavy usage
  • Rate limits: Cloud providers throttle requests during peak times
  • Internet dependency: No connection = no AI assistance
  • Latency: Round-trip to cloud servers adds delay to every interaction

The Solution

local-ai runs models directly on your Mac's GPU using Apple's MLX framework:

  • 100% Private: Your data never leaves your machine
  • Zero Cost: No API fees, subscriptions, or usage limits
  • Always Available: Works offline, on planes, in secure environments
  • Low Latency: Direct GPU inference, no network round-trips
  • OpenAI Compatible: Works with existing tools that support OpenAI's API

Features

  • One-Command Server: Start a local LLM server with local-ai server start
  • OpenAI-Compatible API: Drop-in replacement for OpenAI clients
  • Model Browser: Discover and download optimized MLX models from Hugging Face
  • Hardware Detection: Automatically detects your Mac's capabilities
  • Smart Recommendations: Suggests models that fit your available memory
  • Web Interface: Built-in chat UI for testing models at http://localhost:8080
  • Tool Calling: Function calling support for agentic workflows
  • Rich CLI: Beautiful terminal output with progress bars and status panels

Quick Start

Installation from GitHub

# Clone the repository
git clone https://github.com/tumma72/local-ai.git
cd local-ai

# Install with uv (recommended)
uv sync

# Or with pip
pip install -e .

Basic Usage

# Start the server (models load dynamically)
local-ai server start

# Open http://localhost:8080 in your browser for the web UI

# Or use with any OpenAI-compatible client
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Qwen3-0.6B-4bit",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Discover Models

# See recommended models for your hardware
local-ai models recommend

# Search for specific models
local-ai models search "llama 8b"

# Get detailed model info
local-ai models info mlx-community/Llama-3.2-3B-Instruct-4bit

Server Management

# Check server status
local-ai server status

# View server logs
local-ai server logs --follow

# Restart with new settings
local-ai server restart --port 9000

# Stop the server
local-ai server stop

Configuration

Create a config.toml file for persistent settings:

[server]
host = "127.0.0.1"
port = 8080
log_level = "INFO"

[model]
# Default model (optional - models load dynamically)
path = "mlx-community/Qwen3-0.6B-4bit"

Use with CLI:

local-ai server start --config config.toml

Requirements

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • Python 3.11+
  • 8GB+ RAM recommended (16GB+ for larger models)

Use Cases

IDE Integration

local-ai works with any IDE that supports OpenAI-compatible endpoints:

  • VS Code with Continue extension
  • Cursor (set custom API endpoint)
  • Zed editor (configure assistant)
  • JetBrains IDEs with AI plugins

Claude Code / Aider / Other Tools

# Set environment variables
export OPENAI_API_BASE=http://localhost:8080/v1
export OPENAI_API_KEY=not-needed

# Use your favorite AI coding tool
aider --model mlx-community/Qwen3-0.6B-4bit

Python Integration

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"
)

response = client.chat.completions.create(
    model="mlx-community/Qwen3-0.6B-4bit",
    messages=[{"role": "user", "content": "Explain Python decorators"}]
)
print(response.choices[0].message.content)

Development

# Install development dependencies
uv sync

# Run tests
uv run pytest

# Run with coverage
uv run pytest --cov=local_ai

# Type checking
uv run mypy src/

# Linting
uv run ruff check src/

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

Apache License 2.0 - see LICENSE.md for details.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

local_ai_server-0.2.0a0.tar.gz (129.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

local_ai_server-0.2.0a0-py3-none-any.whl (90.7 kB view details)

Uploaded Python 3

File details

Details for the file local_ai_server-0.2.0a0.tar.gz.

File metadata

  • Download URL: local_ai_server-0.2.0a0.tar.gz
  • Upload date:
  • Size: 129.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for local_ai_server-0.2.0a0.tar.gz
Algorithm Hash digest
SHA256 b2eda382c6b53006bb9bce7c2bed6a4315fc17252da56daa966c19235002ecdf
MD5 9739b38f792b13318efd0d96461126bc
BLAKE2b-256 fa9b6a1cee47f378cd394f62ac3b56a34b759e297980ab8ae21cfa7501f4e7ae

See more details on using hashes here.

Provenance

The following attestation bundles were made for local_ai_server-0.2.0a0.tar.gz:

Publisher: release-and-publish.yml on tumma72/local-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file local_ai_server-0.2.0a0-py3-none-any.whl.

File metadata

File hashes

Hashes for local_ai_server-0.2.0a0-py3-none-any.whl
Algorithm Hash digest
SHA256 a4ea1678f423ed9dc642f4e727cc0c08b59d6ef179bef9ebfa561fc5d168318d
MD5 6ec4a8b61afb89ee4f14d94663b7d250
BLAKE2b-256 a48f27eb213000484286275b6b9905f6739885a7206e8f37c0e1a188e272cf52

See more details on using hashes here.

Provenance

The following attestation bundles were made for local_ai_server-0.2.0a0-py3-none-any.whl:

Publisher: release-and-publish.yml on tumma72/local-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page