Skip to main content

Merlin — a fast local LLM for agentic coding on Apple Silicon

Project description

 Merlin — Specialized Agentic Coding Model

A 3B language model built from scratch exclusively for agentic coding — not a general assistant, not a fine-tuned GPT. Every training token, every design decision, every protocol token is optimized for one job: executing code tasks fast, locally, and at scale.

Runs on any MacBook. No API key. Your code never leaves your machine.

tsuberim.github.io/merlin

The Idea

LLM agents spend most of their tokens on execution, not reasoning — grep a file, run a test, rename a function. Today that all hits a frontier model at $1–10/M tokens.

Merlin is the brute-force execution layer beneath a smarter orchestrator. A frontier model (Claude, GPT-4) plans; Merlin executes — locally, in parallel, at zero marginal cost.

Mode Workers Cost
Local 1, on your MacBook $0 — always
Hosted 100–1000 via GPU batching pay-per-task

Design

Choice What Why
Pre-trained from scratch 100B tokens — code, bash, agentic traces, commits No proprietary distillation; weights are commercially clean
Custom tokenizer 32K BPE + 18 agent protocol special tokens Tool-call protocol is first-class, not bolted on
6K context window Sized for one large Python file + agent overhead Not a general-purpose model
RL post-training GRPO on verifiable bash/filesystem rewards Ground truth without a judge model
MLX int4 inference ~1.5 GB weights, >500 tok/s on M3 Fits any M-series Mac

Agent Protocol

18 special tokens define the tool-call format — the model learns to emit and parse tool calls natively:

<|task|> Read src/main.py and return the function names.
<|think|> I need to read the file first.<|/think|>
<|tool_call|><|tool_name|>read_file<|tool_args|>{"path": "src/main.py"}<|/tool_call|>
<|tool_result|>def train(): ...\ndef evaluate(): ...<|/tool_result|>
<|answer|> train, evaluate

Architecture

GPT-style decoder-only. RMSNorm, SwiGLU, GQA (n_kv_head=8), no bias, weight tying, pre-norm.

Config Params n_embd n_head n_layer block_size
tiny ~1.6M 32 2 2 64
medium ~21M 256 8 8 512
base (330M) ~330M 1024 16 16 2048
3b ~3.17B 3072 24 20 4096

Corpus

~100B tokens across 7 sources. Two-phase curriculum: 80B general mix → 20B upweighted traces + instruction data.

Source Share
Stack v2 — Python ~38%
Stack v2 — Bash / Markdown ~8%
Agentic traces (synthetic) ~15%
GitHub commits + issues ~11%
Stack Overflow ~10%
Math + instruction mix ~12%
tldr pages <1%

v0 corpus (1.19B tokens): tsuberim/merlin-corpus-v0 Tokenizer: tsuberim/merlin-tokenizer-v0

Status

Milestone Status
Agentic protocol + eval harness (49 tasks, 47% on 3B baseline) ✅ Done
Custom BPE tokenizer (32K vocab, 20 special tokens) ✅ Done
Data pipeline (download → tokenize → pack) ✅ Done
v0 corpus on HuggingFace (1.19B tokens) ✅ Done
E2E training loop, 330M model on H100 ✅ Done
SFT infrastructure ✅ Done
Repo scanning pipeline (clone + pytest → passing repos) 🔄 In progress
Agentic trace generation (target: 200K traces) ⏸ Planned
Full 100B token corpus ⏸ Planned
3B pre-training run ⏸ Planned
RL post-training (GRPO on verifiable rewards) ⏸ Planned
MLX int4 3B model release ⏸ Planned

Stack

Role Tool
Training PyTorch + CUDA (NVIDIA H100)
Inference MLX (Apple Silicon / Metal)
Cloud compute Modal
Tokenizer HuggingFace tokenizers (BPE, Rust)
Trace generation vLLM + Qwen2.5-Coder-32B
Observability W&B
Datasets + models HuggingFace Hub

Setup

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merlin_llm-0.1.3.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

merlin_llm-0.1.3-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file merlin_llm-0.1.3.tar.gz.

File metadata

  • Download URL: merlin_llm-0.1.3.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for merlin_llm-0.1.3.tar.gz
Algorithm Hash digest
SHA256 154851932db06324b5383aafd84aa3c425ed33e32b8ffcd81f30774fa1ac36f5
MD5 28499c78f2988ad5e659f61d70533cb8
BLAKE2b-256 dc1759b0bba6eaf1ed6108b90e85f6435c3505a1ad6fb75456d283a51589e782

See more details on using hashes here.

File details

Details for the file merlin_llm-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: merlin_llm-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for merlin_llm-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0837d0285edf0f5546c7f06e663a2bff571ca15ca6a9239578c262853a0c619d
MD5 7d61470becfa2526194abf1aa1260af5
BLAKE2b-256 150388aa2e867bb4d602b8a14672327aeafabc70654d06a6b567185d0c2fd6db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page