Skip to main content

Merlin — a fast local LLM for agentic coding on Apple Silicon

Project description

 Merlin — Specialized Agentic Coding Model

A 3B language model built from scratch exclusively for agentic coding — not a general assistant, not a fine-tuned GPT. Every training token, every design decision, every protocol token is optimized for one job: executing code tasks fast, locally, and at scale.

Runs on any MacBook. No API key. Your code never leaves your machine.

tsuberim.github.io/merlin

The Idea

LLM agents spend most of their tokens on execution, not reasoning — grep a file, run a test, rename a function. Today that all hits a frontier model at $1–10/M tokens.

Merlin is the brute-force execution layer beneath a smarter orchestrator. A frontier model (Claude, GPT-4) plans; Merlin executes — locally, in parallel, at zero marginal cost.

Mode Workers Cost
Local 1, on your MacBook $0 — always
Hosted 100–1000 via GPU batching pay-per-task

Design

Choice What Why
Pre-trained from scratch 100B tokens — code, bash, agentic traces, commits No proprietary distillation; weights are commercially clean
Custom tokenizer 32K BPE + 18 agent protocol special tokens Tool-call protocol is first-class, not bolted on
6K context window Sized for one large Python file + agent overhead Not a general-purpose model
RL post-training GRPO on verifiable bash/filesystem rewards Ground truth without a judge model
MLX int4 inference ~1.5 GB weights, >500 tok/s on M3 Fits any M-series Mac

Agent Protocol

18 special tokens define the tool-call format — the model learns to emit and parse tool calls natively:

<|task|> Read src/main.py and return the function names.
<|think|> I need to read the file first.<|/think|>
<|tool_call|><|tool_name|>read_file<|tool_args|>{"path": "src/main.py"}<|/tool_call|>
<|tool_result|>def train(): ...\ndef evaluate(): ...<|/tool_result|>
<|answer|> train, evaluate

Architecture

GPT-style decoder-only. RMSNorm, SwiGLU, GQA (n_kv_head=8), no bias, weight tying, pre-norm.

Config Params n_embd n_head n_layer block_size
tiny ~1.6M 32 2 2 64
medium ~21M 256 8 8 512
base (330M) ~330M 1024 16 16 2048
3b ~3.17B 3072 24 20 4096

Corpus

~100B tokens across 7 sources. Two-phase curriculum: 80B general mix → 20B upweighted traces + instruction data.

Source Share
Stack v2 — Python ~38%
Stack v2 — Bash / Markdown ~8%
Agentic traces (synthetic) ~15%
GitHub commits + issues ~11%
Stack Overflow ~10%
Math + instruction mix ~12%
tldr pages <1%

v0 corpus (1.19B tokens): tsuberim/merlin-corpus-v0 Tokenizer: tsuberim/merlin-tokenizer-v0

Status

Milestone Status
Agentic protocol + eval harness (49 tasks, 47% on 3B baseline) ✅ Done
Custom BPE tokenizer (32K vocab, 20 special tokens) ✅ Done
Data pipeline (download → tokenize → pack) ✅ Done
v0 corpus on HuggingFace (1.19B tokens) ✅ Done
E2E training loop, 330M model on H100 ✅ Done
SFT infrastructure ✅ Done
Repo scanning pipeline (clone + pytest → passing repos) 🔄 In progress
Agentic trace generation (target: 200K traces) ⏸ Planned
Full 100B token corpus ⏸ Planned
3B pre-training run ⏸ Planned
RL post-training (GRPO on verifiable rewards) ⏸ Planned
MLX int4 3B model release ⏸ Planned

Stack

Role Tool
Training PyTorch + CUDA (NVIDIA H100)
Inference MLX (Apple Silicon / Metal)
Cloud compute Modal
Tokenizer HuggingFace tokenizers (BPE, Rust)
Trace generation vLLM + Qwen2.5-Coder-32B
Observability W&B
Datasets + models HuggingFace Hub

Setup

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merlin_llm-0.1.6.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

merlin_llm-0.1.6-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file merlin_llm-0.1.6.tar.gz.

File metadata

  • Download URL: merlin_llm-0.1.6.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for merlin_llm-0.1.6.tar.gz
Algorithm Hash digest
SHA256 2fbc3e7a8400d1f635560de85bd6bdee7175e718b942cf342afc5e8a5c1ca773
MD5 d7dea99d6235dc4eac1635ed5f188681
BLAKE2b-256 b61ec974369c883372d4f63198972177c25d49ce6716a29e54c77e3a1b989e70

See more details on using hashes here.

File details

Details for the file merlin_llm-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: merlin_llm-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for merlin_llm-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 ab583d5e41057e0c24d5cccf6a164c04d8793476d6345cc3a6738be3a7135761
MD5 ec279077b24e3d1b578beafe7e45d8d4
BLAKE2b-256 ed1d91e05234622f39e444940fb2eaf64b738ae0b61ebdd4492114ff4569d9fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page