Skip to main content

Think-block filter for LLM streams

Project description

thinkstrip — Think-block filter for LLM streams

PyPI version Python versions License: MIT CI

thinkstrip removes <think>...</think> blocks from model output in both batch and streaming mode. It is designed for reasoning models (Qwen3, DeepSeek-R1, and others) that emit internal reasoning before their visible answer.

The streaming case is the reason this package exists: tag boundaries can split across adjacent token yields, so a correct implementation needs a stateful rolling buffer instead of a post-generation regex.


Why this exists

Reasoning models can emit output like:

<think>hidden chain of thought</think>The actual answer.

For fully materialized strings, stripping is easy. For token streams, it is not. Partial tags can arrive across multiple adjacent token yields, for example:

  • <thi then nk>
  • </thi then nk>

A naive .replace() or regex-per-token approach leaks fragments or drops visible output. thinkstrip solves this with a stateful streaming filter.


Install

Requirements:

  • Python 3.13+
  • Zero runtime dependencies — installs and runs with nothing beyond the standard library
pip install thinkstrip

Development install:

pip install -e ".[dev]"

Quick start

Streaming

from thinkstrip import ThinkStrip

stripper = ThinkStrip()
chunks   = []

for token in ['<thi', 'nk>', 'hidden', '</thi', 'nk>', 'The answer.']:
    if emitted := stripper.feed(token):
        chunks.append(emitted)

if flushed := stripper.flush():
    chunks.append(flushed)

print(''.join(chunks))
# The answer.

Batch

from thinkstrip import strip_think

clean = strip_think('<think>reasoning</think>The actual answer.')
print(clean)
# The actual answer.

Prompt pre-cleaner

Some GGUF chat templates inject <think> at the end of the rendered prompt before the model generates. This breaks the streaming filter because the model never emits its own <think>. Call strip_think_prefill on the rendered prompt to remove it:

from thinkstrip import strip_think_prefill

prompt = strip_think_prefill(prompt)
# trailing '<think>' removed if present; no-op otherwise

Public API

from thinkstrip import ThinkStrip, strip_think, strip_think_prefill

ThinkStrip

Stateful streaming filter. Create one instance per response stream.

ThinkStrip(
    open_tag:  str  = '<think>',
    close_tag: str  = '</think>',
    capture:   bool = False,
)
Method / property Description
.feed(token: str) -> str Process one token. Returns the text to emit (empty string when nothing ready yet).
.flush() -> str Call once at end-of-stream. Returns any buffered visible text. Empty if stream ended inside a think block.
.reset() -> None Reset to initial state. Use to process a second stream with the same instance.
.think_content: str Accumulated think-block text. Non-empty only when capture=True.
.in_think_block: bool True if the stream ended mid-think-block. Useful for diagnostics.
Constructor parameter Type Default Description
open_tag str <think> Opening tag to strip
close_tag str </think> Closing tag to strip
capture bool False Retain think content in .think_content instead of discarding

Buffer sizes are derived automatically: len(open_tag) - 1 chars for the opening-tag guard, len(close_tag) - 1 for the closing-tag guard. Custom tags carry no extra cost.

strip_think

Stateless helper for complete strings. Implemented via ThinkStrip — batch and streaming behavior are identical.

strip_think(
    text:      str,
    open_tag:  str = '<think>',
    close_tag: str = '</think>',
) -> str

strip_think_prefill

Removes a trailing open tag injected by some GGUF chat templates.

strip_think_prefill(
    prompt:   str,
    open_tag: str = '<think>',
) -> str

Capture mode

When capture=True, think content accumulates in .think_content instead of being discarded. Multiple think blocks per response are concatenated. Useful for surfacing the model's reasoning in a separate UI panel or for eval runs.

stripper = ThinkStrip(capture=True)

for token in stream:
    if emitted := stripper.feed(token):
        yield emitted

if flushed := stripper.flush():
    yield flushed

print(stripper.think_content)  # full reasoning text

Limitations

  • Nested tags are not supported. A <think> that arrives while already inside a think block is treated as think content and swallowed. The first </think> closes the block; any subsequent </think> with no matching open tag passes through as visible text. In practice this does not occur — Qwen3 and DeepSeek-R1 emit exactly one think block per response.

Development

git clone https://github.com/informity/thinkstrip.git
cd thinkstrip

python3 -m venv .venv
source .venv/bin/activate

pip install -e ".[dev]"

make lint
make test
make build

Contributing

See CONTRIBUTING.md.


License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thinkstrip-0.2.0.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thinkstrip-0.2.0-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file thinkstrip-0.2.0.tar.gz.

File metadata

  • Download URL: thinkstrip-0.2.0.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for thinkstrip-0.2.0.tar.gz
Algorithm Hash digest
SHA256 83858c85b6eabaa42621c29179d9ff9f9af7ea5497009d5ea8d6c8970b3860ce
MD5 682c0f4aa6b7f6be5c7447c7727a2460
BLAKE2b-256 e6fe77e27e23457807c840c8aa388f36205df2012501d7131eb499513ed79e94

See more details on using hashes here.

Provenance

The following attestation bundles were made for thinkstrip-0.2.0.tar.gz:

Publisher: publish.yml on informity/thinkstrip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file thinkstrip-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: thinkstrip-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for thinkstrip-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 34e767ac769854c032aa479e07cf0bcf6e5d9e89b8c2298e249c7872691dcb04
MD5 450a43f9981c7e506f69a6b13a69b239
BLAKE2b-256 f016f4845d84d185496eb1a47dcd1cfdd60bac8a75f5a8341c13316875dcfd08

See more details on using hashes here.

Provenance

The following attestation bundles were made for thinkstrip-0.2.0-py3-none-any.whl:

Publisher: publish.yml on informity/thinkstrip

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page