Skip to main content

Robust extraction of structured payloads from LLM output via sentinel blocks (<<<TAG>>> … <<<END>>>).

Project description

Sentinel Blocks

Get structured data out of LLM text — reliably, in any language.

Teach the model to wrap each payload between unmistakable markers, then extract it with one pass that never re-parses content as code. Survives the quotes, newlines, backticks, and braces that shatter a naive JSON.parse.

CI License: GPL v3 spec v1.0 languages

🌐 Website & guide → eth-interchained.github.io/sentinel-blocks

<<<JSON>>>
{ "summary": "one object, taken verbatim", "ok": true }
<<<END>>>

<<<FILE src/app.ts>>>
export const greet = (n: string) => `hi ${n}`;   // code lives OUTSIDE the JSON
<<<END>>>

The problem

You ask a model for JSON. It mostly complies — until the day a value contains a quote, a newline, a stray brace, or a code snippet:

{ "code": "const re = /\{.*\}/; // oops — unescaped braces & slashes" }
//                                  ^ JSON.parse throws: "Expected ',' or '}'…"

Now your pipeline falls back to a mock, retries, or crashes. The root cause is almost always code or free text stuffed into a JSON string.

The idea (the "lingo")

Stop fighting the escaping. Tell the model to put each payload between sentinels:

<<<TAG>>>
...payload, taken VERBATIM...
<<<END>>>

and to give code its own named block instead of burying it in JSON:

<<<FILE path/to/file>>>
...file contents, verbatim...
<<<END>>>

Extraction is a single regex (or a tiny scanner). The bytes between the markers are returned untouched — so quotes, braces, and newlines inside them are harmless. For JSON specifically: keep the JSON to data only, and the #1 cause of parse failures disappears.

Distilled from the surgical-edit parser in KeyStone-Lite, generalized into a tiny format with a formal spec and ports in 10 languages.


Install

npm install sentinel-blocks      # Node / TypeScript / Bun / Deno
pip install sentinel-blocks      # Python 3.8+

Every other language is a single dependency-free file — vendor it:

Language Copy this into your project
C implementations/c/sentinel_blocks.{h,c}
C++ (header-only) implementations/cpp/sentinel_blocks.hpp
Go implementations/go/sentinelblocks.go
Rust implementations/rust (cargo add from git)
Java implementations/java/SentinelBlocks.java
Ruby implementations/ruby/sentinel_blocks.rb
PHP implementations/php/sentinel_blocks.php
JS (no build) implementations/javascript/sentinel-blocks.{mjs,cjs}

Quickstart

TypeScript / JavaScript

import { jsonFromResponse, extractTaggedBlocks } from "sentinel-blocks";

const reply = await llm(prompt);                 // raw completion text

const meta = jsonFromResponse(reply);            // robust: never breaks on code-in-string
for (const { arg, content } of extractTaggedBlocks(reply, "FILE")) {
  writeFileSync(arg, content);                   // arg = path, content = verbatim file
}

Python

from sentinel_blocks import json_from_response, extract_tagged_blocks

meta = json_from_response(reply)                 # dict; raises only if nothing is parseable
for arg, content in extract_tagged_blocks(reply, "FILE"):
    open(arg, "w").write(content)
Go, Rust, C, C++, Java, Ruby, PHP
import sb "sentinelblocks"
meta, err := sb.JSONFromResponse(reply, "")          // map[string]interface{}
files := sb.ExtractTaggedBlocks(reply, "FILE")        // []sb.Block{Arg, Content}
use sentinel_blocks as sb;
let json_text = sb::json_text_from_response(&reply, None);   // Option<String> → serde_json
let files = sb::extract_tagged_blocks(&reply, "FILE");        // Vec<Block { arg, content }>
char *json = sb_json_text_from_response(reply, NULL);   // feed to cJSON/jansson; sb_free(json)
sb_blocks files = sb_extract_blocks(reply, "FILE");     // sb_blocks_free(&files)
auto json  = sentinel::jsonTextFromResponse(reply);     // std::optional<std::string>
auto files = sentinel::extractTaggedBlocks(reply, "FILE");
var json  = SentinelBlocks.jsonTextFromResponse(reply, null); // Optional<String>
var files = SentinelBlocks.extractTaggedBlocks(reply, "FILE");
meta  = SentinelBlocks.json_from_response(reply)        # Hash
files = SentinelBlocks.extract_tagged_blocks(reply, "FILE")
$meta  = SentinelBlocks\json_from_response($reply);     // associative array
$files = SentinelBlocks\extract_tagged_blocks($reply, 'FILE');

API at a glance

Same behavior everywhere; names follow each language's conventions.

Purpose TS / JS / C++ / Java Python / Ruby / PHP / Rust / C
First block's content extractBlock extract_block
All blocks' content extractBlocks extract_blocks
Blocks with their arg extractTaggedBlocks extract_tagged_blocks
Build a block wrap / wrapNamed wrap / wrap_named
Strip fences + trailing commas repairJson repair_json
First balanced {…} firstJsonObject first_json_object
Robust JSON entry point jsonFromResponse¹ json_from_response¹

¹ Languages with a stdlib JSON parser (JS, TS, Python, Go, Ruby, PHP) return a parsed value. Languages without one (C, C++, Rust, Java) expose jsonTextFromResponse / json_text_from_response returning the cleaned JSON text to hand to your JSON library. See the spec §4.


The strategy: prompt the model to use it

The format only pays off if the model emits it. Two ready-to-paste prompts:

The golden rules (full version in docs/PROMPTING.md):

  1. Reply only in sentinel blocks — nothing before the first or after the last.
  2. JSON blocks carry data only — never code, never markdown fences.
  3. Code and long text go in their own <<<FILE …>>> blocks, verbatim.

Runnable end-to-end examples: examples/openai-node.mjs, examples/anthropic_python.py, and an EJS prompt-templating demo in examples/ejs/.

Why it works

  • Content is never re-parsed as code. Extraction returns the bytes between the markers untouched, so quotes/braces/newlines inside them can't corrupt anything.
  • Code leaves the JSON. The most common JSON.parse failure — unescaped code in a string — is designed out, not patched up.
  • <<<…>>> is rare in natural text and survives markdown, diffs, and streaming.
  • Graceful JSON fallback. jsonFromResponse tries the block, then a light repair, then a balanced-brace slice, and only fails loudly — it never fabricates.

More detail and failure-mode analysis: docs/WHY-IT-WORKS.md.


Supported languages

All ports pass the same 8-invariant conformance suite (spec §7).

Language Source Tests
TypeScript src/index.ts node tests/test_core.ts
JavaScript (ESM + CJS) implementations/javascript node tests/test_core.mjs · node tests/test_core.cjs
Python python/sentinel_blocks python3 tests/test_core.py
Go implementations/go go test ./...
Rust implementations/rust cargo test
C implementations/c cc test_sentinel_blocks.c sentinel_blocks.c -o t && ./t
C++ implementations/cpp c++ -std=c++17 test_sentinel_blocks.cpp -o t && ./t
Java implementations/java javac *.java && java SentinelBlocksTest
Ruby implementations/ruby ruby test_sentinel_blocks.rb
PHP implementations/php php test_sentinel_blocks.php

Repo layout

sentinel-blocks/
├── SPEC.md                     formal format specification (v1.0)
├── src/index.ts                canonical TypeScript (npm package source)
├── python/sentinel_blocks/     canonical Python (PyPI package source)
├── implementations/            ports: javascript, c, cpp, go, rust, java, ruby, php
├── tests/                      shared invariant suites (TS, JS, Python)
├── examples/                   prompts, OpenAI/Python runners, EJS templating
└── docs/                       PROMPTING.md, WHY-IT-WORKS.md

Contributing

New language ports are very welcome — porting is mechanical and the conformance suite tells you when you're done. See CONTRIBUTING.md for the "add a language in 6 steps" guide.

Credits & License

The technique originates in KeyStone-Lite's surgical-edit parser, generalized here into a documented, multi-language format.

Licensed under GPL-3.0-or-later — see LICENSE. Part of the Interchained ecosystem.

Curious where the Sentinels come from? There's a bit of lore. 🛡️

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sentinel_blocks-1.0.0.tar.gz (21.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sentinel_blocks-1.0.0-py3-none-any.whl (19.6 kB view details)

Uploaded Python 3

File details

Details for the file sentinel_blocks-1.0.0.tar.gz.

File metadata

  • Download URL: sentinel_blocks-1.0.0.tar.gz
  • Upload date:
  • Size: 21.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for sentinel_blocks-1.0.0.tar.gz
Algorithm Hash digest
SHA256 df5153a7a77c69ac9fcc23901a17b98aa26f97439789b9be2ba41eb0a9507abc
MD5 1239a9ad014078ff994ce5ff5f153af6
BLAKE2b-256 ceb5aaeddee51821698899f19e4ae874ae2a0e89fe58f06a5f359aacd0cb9277

See more details on using hashes here.

File details

Details for the file sentinel_blocks-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sentinel_blocks-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3aa70438e221a5531086f9853ce190caafaab2beb0b04528a851db722102acc0
MD5 0184431fd5725e5f6dee7f882b8a0fc9
BLAKE2b-256 a20943259b3f633ca97c98004bd8c3367af744e1942ebe9841d94cf08cd4ffd5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page