Merlin — a fast local LLM for agentic coding on Apple Silicon

Project description

Merlin — Specialized Agentic Coding Model

A 3B language model built from scratch exclusively for agentic coding — not a general assistant, not a fine-tuned GPT. Every training token, every design decision, every protocol token is optimized for one job: executing code tasks fast, locally, and at scale.

Runs on any MacBook. No API key. Your code never leaves your machine.

→ tsuberim.github.io/merlin

The Idea

LLM agents spend most of their tokens on execution, not reasoning — grep a file, run a test, rename a function. Today that all hits a frontier model at $1–10/M tokens.

Merlin is the brute-force execution layer beneath a smarter orchestrator. A frontier model (Claude, GPT-4) plans; Merlin executes — locally, in parallel, at zero marginal cost.

Mode	Workers	Cost
Local	1, on your MacBook	$0 — always
Hosted	100–1000 via GPU batching	pay-per-task

Design

Choice	What	Why
Pre-trained from scratch	100B tokens — code, bash, agentic traces, commits	No proprietary distillation; weights are commercially clean
Custom tokenizer	32K BPE + 18 agent protocol special tokens	Tool-call protocol is first-class, not bolted on
6K context window	Sized for one large Python file + agent overhead	Not a general-purpose model
RL post-training	GRPO on verifiable bash/filesystem rewards	Ground truth without a judge model
MLX int4 inference	~1.5 GB weights, >500 tok/s on M3	Fits any M-series Mac

Agent Protocol

18 special tokens define the tool-call format — the model learns to emit and parse tool calls natively:

<|task|> Read src/main.py and return the function names.
<|think|> I need to read the file first.<|/think|>
<|tool_call|><|tool_name|>read_file<|tool_args|>{"path": "src/main.py"}<|/tool_call|>
<|tool_result|>def train(): ...\ndef evaluate(): ...<|/tool_result|>
<|answer|> train, evaluate

Architecture

GPT-style decoder-only. RMSNorm, SwiGLU, GQA (n_kv_head=8), no bias, weight tying, pre-norm.

Config	Params	n_embd	n_head	n_layer	block_size
tiny	~1.6M	32	2	2	64
medium	~21M	256	8	8	512
base (330M)	~330M	1024	16	16	2048
3b	~3.17B	3072	24	20	4096

Corpus

~100B tokens across 7 sources. Two-phase curriculum: 80B general mix → 20B upweighted traces + instruction data.

Source	Share
Stack v2 — Python	~38%
Stack v2 — Bash / Markdown	~8%
Agentic traces (synthetic)	~15%
GitHub commits + issues	~11%
Stack Overflow	~10%
Math + instruction mix	~12%
tldr pages	<1%

v0 corpus (1.19B tokens): tsuberim/merlin-corpus-v0 Tokenizer: tsuberim/merlin-tokenizer-v0

Status

Milestone	Status
Agentic protocol + eval harness (49 tasks, 47% on 3B baseline)	✅ Done
Custom BPE tokenizer (32K vocab, 20 special tokens)	✅ Done
Data pipeline (download → tokenize → pack)	✅ Done
v0 corpus on HuggingFace (1.19B tokens)	✅ Done
E2E training loop, 330M model on H100	✅ Done
SFT infrastructure	✅ Done
Repo scanning pipeline (clone + pytest → passing repos)	🔄 In progress
Agentic trace generation (target: 200K traces)	⏸ Planned
Full 100B token corpus	⏸ Planned
3B pre-training run	⏸ Planned
RL post-training (GRPO on verifiable rewards)	⏸ Planned
MLX int4 3B model release	⏸ Planned

Stack

Role	Tool
Training	PyTorch + CUDA (NVIDIA H100)
Inference	MLX (Apple Silicon / Metal)
Cloud compute	Modal
Tokenizer	HuggingFace tokenizers (BPE, Rust)
Trace generation	vLLM + Qwen2.5-Coder-32B
Observability	W&B
Datasets + models	HuggingFace Hub

Setup

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

License

MIT

Project details

Release history Release notifications | RSS feed

0.1.12

Apr 13, 2026

0.1.11

Apr 13, 2026

0.1.10

Apr 13, 2026

0.1.8

Apr 13, 2026

0.1.7

Apr 13, 2026

0.1.6

Apr 13, 2026

0.1.5

Apr 13, 2026

0.1.4

Apr 13, 2026

This version

0.1.3

Apr 13, 2026

0.1.2

Apr 13, 2026

0.1.1

Apr 13, 2026

0.1.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merlin_llm-0.1.3.tar.gz (13.9 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

merlin_llm-0.1.3-py3-none-any.whl (13.1 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file merlin_llm-0.1.3.tar.gz.

File metadata

Download URL: merlin_llm-0.1.3.tar.gz
Upload date: Apr 13, 2026
Size: 13.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for merlin_llm-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`154851932db06324b5383aafd84aa3c425ed33e32b8ffcd81f30774fa1ac36f5`
MD5	`28499c78f2988ad5e659f61d70533cb8`
BLAKE2b-256	`dc1759b0bba6eaf1ed6108b90e85f6435c3505a1ad6fb75456d283a51589e782`

See more details on using hashes here.

File details

Details for the file merlin_llm-0.1.3-py3-none-any.whl.

File metadata

Download URL: merlin_llm-0.1.3-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 13.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for merlin_llm-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0837d0285edf0f5546c7f06e663a2bff571ca15ca6a9239578c262853a0c619d`
MD5	`7d61470becfa2526194abf1aa1260af5`
BLAKE2b-256	`150388aa2e867bb4d602b8a14672327aeafabc70654d06a6b567185d0c2fd6db`

See more details on using hashes here.

merlin-llm 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Merlin — Specialized Agentic Coding Model

The Idea

Design

Agent Protocol

Architecture

Corpus

Status

Stack

Setup

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes