Skip to main content

Streaming-safe stripping of <think> blocks from reasoning model output

Project description

thinkstrip — Streaming Think-Block Stripper for LLM Output

PyPI version Python versions License: MIT CI

thinkstrip removes <think>...</think> blocks from model output in both batch and streaming mode. It is designed for reasoning models (Qwen3, DeepSeek-R1, and others) that emit internal reasoning before their visible answer.

The streaming case is the reason this package exists: tag boundaries can split across adjacent token yields, so a correct implementation needs a stateful rolling buffer instead of a post-generation regex.


Why this exists

Reasoning models can emit output like:

<think>hidden chain of thought</think>The actual answer.

For fully materialized strings, stripping is easy. For token streams, it is not. Partial tags can arrive across multiple adjacent token yields, for example:

  • <thi then nk>
  • </thi then nk>

A naive .replace() or regex-per-token approach leaks fragments or drops visible output. thinkstrip solves this with a stateful streaming filter.


Install

Requirements:

  • Python 3.13+
  • Zero runtime dependencies — installs and runs with nothing beyond the standard library
pip install thinkstrip

Development install:

pip install -e ".[dev]"

Quick start

Streaming

from thinkstrip import ThinkStrip

stripper = ThinkStrip()
chunks   = []

for token in ['<thi', 'nk>', 'hidden', '</thi', 'nk>', 'The answer.']:
    if emitted := stripper.feed(token):
        chunks.append(emitted)

if flushed := stripper.flush():
    chunks.append(flushed)

print(''.join(chunks))
# The answer.

Async streaming

from thinkstrip import AsyncThinkStrip

stripper = AsyncThinkStrip()
chunks   = []

async for token in model_stream:
    if emitted := await stripper.feed(token):
        chunks.append(emitted)

if flushed := await stripper.flush():
    chunks.append(flushed)

Batch

from thinkstrip import strip_think

clean = strip_think('<think>reasoning</think>The actual answer.')
print(clean)
# The actual answer.

Prompt pre-cleaner

Some GGUF chat templates inject <think> at the end of the rendered prompt before the model generates. This breaks the streaming filter because the model never emits its own <think>. Call strip_think_prefill on the rendered prompt to remove it:

from thinkstrip import strip_think_prefill

prompt = strip_think_prefill(prompt)
# trailing '<think>' removed if present; no-op otherwise

Public API

from thinkstrip import ThinkStrip, AsyncThinkStrip, strip_think, strip_think_prefill

ThinkStrip

Stateful streaming filter. Create one instance per response stream.

ThinkStrip(
    open_tag:  str  = '<think>',
    close_tag: str  = '</think>',
    capture:   bool = False,
)
Method / property Description
.feed(token: str) -> str Process one token. Returns the text to emit (empty string when nothing ready yet).
.flush() -> str Call once at end-of-stream. Returns any buffered visible text. Empty if stream ended inside a think block.
.think_content: str Accumulated think-block text. Non-empty only when capture=True.
.in_think_block: bool True if the stream ended mid-think-block. Useful for diagnostics.
Constructor parameter Type Default Description
open_tag str <think> Opening tag to strip
close_tag str </think> Closing tag to strip
capture bool False Retain think content in .think_content instead of discarding

Buffer sizes are derived automatically: len(open_tag) - 1 chars for the opening-tag guard, len(close_tag) - 1 for the closing-tag guard. Custom tags carry no extra cost.

AsyncThinkStrip

Async wrapper around ThinkStrip. Delegates to asyncio.to_thread() — no threading primitives required by the caller. Same constructor signature and properties as ThinkStrip.

Method Description
await .feed(token: str) -> str Async variant of ThinkStrip.feed()
await .flush() -> str Async variant of ThinkStrip.flush()

strip_think

Stateless helper for complete strings. Implemented via ThinkStrip — batch and streaming behavior are identical.

strip_think(
    text:      str,
    open_tag:  str = '<think>',
    close_tag: str = '</think>',
) -> str

strip_think_prefill

Removes a trailing open tag injected by some GGUF chat templates.

strip_think_prefill(
    prompt:   str,
    open_tag: str = '<think>',
) -> str

Capture mode

When capture=True, think content accumulates in .think_content instead of being discarded. Multiple think blocks per response are concatenated. Useful for surfacing the model's reasoning in a separate UI panel or for eval runs.

stripper = ThinkStrip(capture=True)

for token in stream:
    if emitted := stripper.feed(token):
        yield emitted

stripper.flush()

print(stripper.think_content)  # full reasoning text

Limitations

  • Nested tags are not supported. A <think> that arrives while already inside a think block is treated as think content and swallowed. The first </think> closes the block; any subsequent </think> with no matching open tag passes through as visible text. In practice this does not occur — Qwen3 and DeepSeek-R1 emit exactly one think block per response.

Development

git clone https://github.com/informity/thinkstrip.git
cd thinkstrip

python3 -m venv .venv
source .venv/bin/activate

pip install -e ".[dev]"

make lint
make test
make build

Contributing

See CONTRIBUTING.md.


License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thinkstrip-0.1.0.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thinkstrip-0.1.0-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file thinkstrip-0.1.0.tar.gz.

File metadata

  • Download URL: thinkstrip-0.1.0.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for thinkstrip-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b10d6e6accd8592c50c9a7630f0cc35ea70997e0c66b1e2af52bf1ea34940d61
MD5 91fd4bd28e6c9c1cce9a815e36d6b310
BLAKE2b-256 4530802e585ae7cd61b7b9501c58b166c1847d24e4056dbc15d3cc3b39a6119d

See more details on using hashes here.

Provenance

The following attestation bundles were made for thinkstrip-0.1.0.tar.gz:

Publisher: publish.yml on informity/thinkstrip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file thinkstrip-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: thinkstrip-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for thinkstrip-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b97d6a8b9ac82af60f7ce9032d81dfb6348b8da85790ab8777b030a97d3b48d9
MD5 5ccc1498ff8ce0cf6094e5489f211500
BLAKE2b-256 31a4b2918d7709c6feb370b1d838dd928f69f5e592039adb1ec10d1bee7c0934

See more details on using hashes here.

Provenance

The following attestation bundles were made for thinkstrip-0.1.0-py3-none-any.whl:

Publisher: publish.yml on informity/thinkstrip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page