Skip to main content

Merlin — a fast local LLM for agentic coding on Apple Silicon

Project description

 Merlin — Specialized Agentic Coding Model

A 3B language model built from scratch exclusively for agentic coding — not a general assistant, not a fine-tuned GPT. Every training token, every design decision, every protocol token is optimized for one job: executing code tasks fast, locally, and at scale.

Runs on any MacBook. No API key. Your code never leaves your machine.

tsuberim.github.io/merlin

The Idea

LLM agents spend most of their tokens on execution, not reasoning — grep a file, run a test, rename a function. Today that all hits a frontier model at $1–10/M tokens.

Merlin is the brute-force execution layer beneath a smarter orchestrator. A frontier model (Claude, GPT-4) plans; Merlin executes — locally, in parallel, at zero marginal cost.

Mode Workers Cost
Local 1, on your MacBook $0 — always
Hosted 100–1000 via GPU batching pay-per-task

Design

Choice What Why
Pre-trained from scratch 100B tokens — code, bash, agentic traces, commits No proprietary distillation; weights are commercially clean
Custom tokenizer 32K BPE + 18 agent protocol special tokens Tool-call protocol is first-class, not bolted on
6K context window Sized for one large Python file + agent overhead Not a general-purpose model
RL post-training GRPO on verifiable bash/filesystem rewards Ground truth without a judge model
MLX int4 inference ~1.5 GB weights, >500 tok/s on M3 Fits any M-series Mac

Agent Protocol

18 special tokens define the tool-call format — the model learns to emit and parse tool calls natively:

<|task|> Read src/main.py and return the function names.
<|think|> I need to read the file first.<|/think|>
<|tool_call|><|tool_name|>read_file<|tool_args|>{"path": "src/main.py"}<|/tool_call|>
<|tool_result|>def train(): ...\ndef evaluate(): ...<|/tool_result|>
<|answer|> train, evaluate

Architecture

GPT-style decoder-only. RMSNorm, SwiGLU, GQA (n_kv_head=8), no bias, weight tying, pre-norm.

Config Params n_embd n_head n_layer block_size
tiny ~1.6M 32 2 2 64
medium ~21M 256 8 8 512
base (330M) ~330M 1024 16 16 2048
3b ~3.17B 3072 24 20 4096

Corpus

~100B tokens across 7 sources. Two-phase curriculum: 80B general mix → 20B upweighted traces + instruction data.

Source Share
Stack v2 — Python ~38%
Stack v2 — Bash / Markdown ~8%
Agentic traces (synthetic) ~15%
GitHub commits + issues ~11%
Stack Overflow ~10%
Math + instruction mix ~12%
tldr pages <1%

v0 corpus (1.19B tokens): tsuberim/merlin-corpus-v0 Tokenizer: tsuberim/merlin-tokenizer-v0

Status

Milestone Status
Agentic protocol + eval harness (49 tasks, 47% on 3B baseline) ✅ Done
Custom BPE tokenizer (32K vocab, 20 special tokens) ✅ Done
Data pipeline (download → tokenize → pack) ✅ Done
v0 corpus on HuggingFace (1.19B tokens) ✅ Done
E2E training loop, 330M model on H100 ✅ Done
SFT infrastructure ✅ Done
Repo scanning pipeline (clone + pytest → passing repos) 🔄 In progress
Agentic trace generation (target: 200K traces) ⏸ Planned
Full 100B token corpus ⏸ Planned
3B pre-training run ⏸ Planned
RL post-training (GRPO on verifiable rewards) ⏸ Planned
MLX int4 3B model release ⏸ Planned

Stack

Role Tool
Training PyTorch + CUDA (NVIDIA H100)
Inference MLX (Apple Silicon / Metal)
Cloud compute Modal
Tokenizer HuggingFace tokenizers (BPE, Rust)
Trace generation vLLM + Qwen2.5-Coder-32B
Observability W&B
Datasets + models HuggingFace Hub

Setup

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merlin_llm-0.1.10.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

merlin_llm-0.1.10-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file merlin_llm-0.1.10.tar.gz.

File metadata

  • Download URL: merlin_llm-0.1.10.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for merlin_llm-0.1.10.tar.gz
Algorithm Hash digest
SHA256 e12d3c8c3d22707be725e12c4ee431a896108875f0f9b21fbf0c71099f094fc9
MD5 5eeb0e66934101f308629c4241fc41e9
BLAKE2b-256 9edd2ba586180a5c3c74d8921132ce8e4f918b532139e5873c3a3467487ce2da

See more details on using hashes here.

File details

Details for the file merlin_llm-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: merlin_llm-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for merlin_llm-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 8596d2de1df4f5bdbf5ac1c7ba40e3452b74e2d65ef03d456290e6f02ca8c001
MD5 8ac802ad9550be2f1dc63fb020e41309
BLAKE2b-256 ed1bf6af64da1bf0ac22ff4ab8cdae74ac3d6cd74be3df26db9adc8c60b988e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page