Robust extraction of structured payloads from LLM output via sentinel blocks (<<<TAG>>> … <<<END>>>).
Project description
Sentinel Blocks
Get structured data out of LLM text — reliably, in any language.
Teach the model to wrap each payload between unmistakable markers, then extract it
with one pass that never re-parses content as code. Survives the quotes,
newlines, backticks, and braces that shatter a naive JSON.parse.
🌐 Website & guide → eth-interchained.github.io/sentinel-blocks
<<<JSON>>>
{ "summary": "one object, taken verbatim", "ok": true }
<<<END>>>
<<<FILE src/app.ts>>>
export const greet = (n: string) => `hi ${n}`; // code lives OUTSIDE the JSON
<<<END>>>
The problem
You ask a model for JSON. It mostly complies — until the day a value contains a quote, a newline, a stray brace, or a code snippet:
{ "code": "const re = /\{.*\}/; // oops — unescaped braces & slashes" }
// ^ JSON.parse throws: "Expected ',' or '}'…"
Now your pipeline falls back to a mock, retries, or crashes. The root cause is almost always code or free text stuffed into a JSON string.
The idea (the "lingo")
Stop fighting the escaping. Tell the model to put each payload between sentinels:
<<<TAG>>>
...payload, taken VERBATIM...
<<<END>>>
and to give code its own named block instead of burying it in JSON:
<<<FILE path/to/file>>>
...file contents, verbatim...
<<<END>>>
Extraction is a single regex (or a tiny scanner). The bytes between the markers are returned untouched — so quotes, braces, and newlines inside them are harmless. For JSON specifically: keep the JSON to data only, and the #1 cause of parse failures disappears.
Distilled from the surgical-edit parser in KeyStone-Lite, generalized into a tiny format with a formal spec and ports in 10 languages.
Install
npm install sentinel-blocks # Node / TypeScript / Bun / Deno
pip install sentinel-blocks # Python 3.8+
Every other language is a single dependency-free file — vendor it:
| Language | Copy this into your project |
|---|---|
| C | implementations/c/sentinel_blocks.{h,c} |
| C++ (header-only) | implementations/cpp/sentinel_blocks.hpp |
| Go | implementations/go/sentinelblocks.go |
| Rust | implementations/rust (cargo add from git) |
| Java | implementations/java/SentinelBlocks.java |
| Ruby | implementations/ruby/sentinel_blocks.rb |
| PHP | implementations/php/sentinel_blocks.php |
| JS (no build) | implementations/javascript/sentinel-blocks.{mjs,cjs} |
Quickstart
TypeScript / JavaScript
import { jsonFromResponse, extractTaggedBlocks } from "sentinel-blocks";
const reply = await llm(prompt); // raw completion text
const meta = jsonFromResponse(reply); // robust: never breaks on code-in-string
for (const { arg, content } of extractTaggedBlocks(reply, "FILE")) {
writeFileSync(arg, content); // arg = path, content = verbatim file
}
Python
from sentinel_blocks import json_from_response, extract_tagged_blocks
meta = json_from_response(reply) # dict; raises only if nothing is parseable
for arg, content in extract_tagged_blocks(reply, "FILE"):
open(arg, "w").write(content)
Go, Rust, C, C++, Java, Ruby, PHP
import sb "sentinelblocks"
meta, err := sb.JSONFromResponse(reply, "") // map[string]interface{}
files := sb.ExtractTaggedBlocks(reply, "FILE") // []sb.Block{Arg, Content}
use sentinel_blocks as sb;
let json_text = sb::json_text_from_response(&reply, None); // Option<String> → serde_json
let files = sb::extract_tagged_blocks(&reply, "FILE"); // Vec<Block { arg, content }>
char *json = sb_json_text_from_response(reply, NULL); // feed to cJSON/jansson; sb_free(json)
sb_blocks files = sb_extract_blocks(reply, "FILE"); // sb_blocks_free(&files)
auto json = sentinel::jsonTextFromResponse(reply); // std::optional<std::string>
auto files = sentinel::extractTaggedBlocks(reply, "FILE");
var json = SentinelBlocks.jsonTextFromResponse(reply, null); // Optional<String>
var files = SentinelBlocks.extractTaggedBlocks(reply, "FILE");
meta = SentinelBlocks.json_from_response(reply) # Hash
files = SentinelBlocks.extract_tagged_blocks(reply, "FILE")
$meta = SentinelBlocks\json_from_response($reply); // associative array
$files = SentinelBlocks\extract_tagged_blocks($reply, 'FILE');
API at a glance
Same behavior everywhere; names follow each language's conventions.
| Purpose | TS / JS / C++ / Java | Python / Ruby / PHP / Rust / C |
|---|---|---|
| First block's content | extractBlock |
extract_block |
| All blocks' content | extractBlocks |
extract_blocks |
Blocks with their arg |
extractTaggedBlocks |
extract_tagged_blocks |
| Build a block | wrap / wrapNamed |
wrap / wrap_named |
| Strip fences + trailing commas | repairJson |
repair_json |
First balanced {…} |
firstJsonObject |
first_json_object |
| Robust JSON entry point | jsonFromResponse¹ |
json_from_response¹ |
¹ Languages with a stdlib JSON parser (JS, TS, Python, Go, Ruby, PHP) return a
parsed value. Languages without one (C, C++, Rust, Java) expose
jsonTextFromResponse / json_text_from_response returning the cleaned JSON
text to hand to your JSON library. See the spec §4.
The strategy: prompt the model to use it
The format only pays off if the model emits it. Two ready-to-paste prompts:
examples/prompts/json-extraction.md— a single JSON object.examples/prompts/codegen-files.md— metadata + multiple files.
The golden rules (full version in docs/PROMPTING.md):
- Reply only in sentinel blocks — nothing before the first or after the last.
- JSON blocks carry data only — never code, never markdown fences.
- Code and long text go in their own
<<<FILE …>>>blocks, verbatim.
Runnable end-to-end examples: examples/openai-node.mjs,
examples/anthropic_python.py, and an EJS
prompt-templating demo in examples/ejs/.
Why it works
- Content is never re-parsed as code. Extraction returns the bytes between the markers untouched, so quotes/braces/newlines inside them can't corrupt anything.
- Code leaves the JSON. The most common
JSON.parsefailure — unescaped code in a string — is designed out, not patched up. <<<…>>>is rare in natural text and survives markdown, diffs, and streaming.- Graceful JSON fallback.
jsonFromResponsetries the block, then a light repair, then a balanced-brace slice, and only fails loudly — it never fabricates.
More detail and failure-mode analysis: docs/WHY-IT-WORKS.md.
Supported languages
All ports pass the same 8-invariant conformance suite (spec §7).
| Language | Source | Tests |
|---|---|---|
| TypeScript | src/index.ts |
node tests/test_core.ts |
| JavaScript (ESM + CJS) | implementations/javascript |
node tests/test_core.mjs · node tests/test_core.cjs |
| Python | python/sentinel_blocks |
python3 tests/test_core.py |
| Go | implementations/go |
go test ./... |
| Rust | implementations/rust |
cargo test |
| C | implementations/c |
cc test_sentinel_blocks.c sentinel_blocks.c -o t && ./t |
| C++ | implementations/cpp |
c++ -std=c++17 test_sentinel_blocks.cpp -o t && ./t |
| Java | implementations/java |
javac *.java && java SentinelBlocksTest |
| Ruby | implementations/ruby |
ruby test_sentinel_blocks.rb |
| PHP | implementations/php |
php test_sentinel_blocks.php |
Repo layout
sentinel-blocks/
├── SPEC.md formal format specification (v1.0)
├── src/index.ts canonical TypeScript (npm package source)
├── python/sentinel_blocks/ canonical Python (PyPI package source)
├── implementations/ ports: javascript, c, cpp, go, rust, java, ruby, php
├── tests/ shared invariant suites (TS, JS, Python)
├── examples/ prompts, OpenAI/Python runners, EJS templating
└── docs/ PROMPTING.md, WHY-IT-WORKS.md
Contributing
New language ports are very welcome — porting is mechanical and the conformance
suite tells you when you're done. See CONTRIBUTING.md for the
"add a language in 6 steps" guide.
Credits & License
The technique originates in KeyStone-Lite's surgical-edit parser, generalized here into a documented, multi-language format.
Licensed under GPL-3.0-or-later — see LICENSE. Part of the
Interchained ecosystem.
Curious where the Sentinels come from? There's a bit of lore. 🛡️
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sentinel_blocks-1.0.0.tar.gz.
File metadata
- Download URL: sentinel_blocks-1.0.0.tar.gz
- Upload date:
- Size: 21.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df5153a7a77c69ac9fcc23901a17b98aa26f97439789b9be2ba41eb0a9507abc
|
|
| MD5 |
1239a9ad014078ff994ce5ff5f153af6
|
|
| BLAKE2b-256 |
ceb5aaeddee51821698899f19e4ae874ae2a0e89fe58f06a5f359aacd0cb9277
|
File details
Details for the file sentinel_blocks-1.0.0-py3-none-any.whl.
File metadata
- Download URL: sentinel_blocks-1.0.0-py3-none-any.whl
- Upload date:
- Size: 19.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3aa70438e221a5531086f9853ce190caafaab2beb0b04528a851db722102acc0
|
|
| MD5 |
0184431fd5725e5f6dee7f882b8a0fc9
|
|
| BLAKE2b-256 |
a20943259b3f633ca97c98004bd8c3367af744e1942ebe9841d94cf08cd4ffd5
|