Think-block filter for LLM streams
Project description
ThinkStrip — Think-block filter for LLM streams
thinkstrip removes <think>...</think> blocks from model output in both batch and streaming
mode. It is designed for reasoning models (Qwen3, DeepSeek-R1, and others) that emit internal
reasoning before their visible answer.
The streaming case is the reason this package exists: tag boundaries can split across adjacent token yields, so a correct implementation needs a stateful rolling buffer instead of a post-generation regex.
Why this exists
Reasoning models can emit output like:
<think>hidden chain of thought</think>The actual answer.
For fully materialized strings, stripping is easy. For token streams, it is not. Partial tags can arrive across multiple adjacent token yields, for example:
<thithennk></thithennk>
A naive .replace() or regex-per-token approach leaks fragments or drops visible output.
thinkstrip solves this with a stateful streaming filter.
Install
Requirements:
- Python 3.13+
- Zero runtime dependencies — installs and runs with nothing beyond the standard library
pip install thinkstrip
Development install:
pip install -e ".[dev]"
Quick start
Streaming
from thinkstrip import ThinkStrip
stripper = ThinkStrip()
chunks = []
for token in ['<thi', 'nk>', 'hidden', '</thi', 'nk>', 'The answer.']:
if emitted := stripper.feed(token):
chunks.append(emitted)
if flushed := stripper.flush():
chunks.append(flushed)
print(''.join(chunks))
# The answer.
Batch
from thinkstrip import strip_think
clean = strip_think('<think>reasoning</think>The actual answer.')
print(clean)
# The actual answer.
Prompt pre-cleaner
Some GGUF chat templates inject <think> at the end of the rendered prompt before
the model generates. This breaks the streaming filter because the model never emits
its own <think>. Call strip_think_prefill on the rendered prompt to remove it:
from thinkstrip import strip_think_prefill
prompt = strip_think_prefill(prompt)
# trailing '<think>' removed if present; no-op otherwise
Public API
from thinkstrip import ThinkStrip, strip_think, strip_think_prefill
ThinkStrip
Stateful streaming filter. Create one instance per response stream.
ThinkStrip(
open_tag: str = '<think>',
close_tag: str = '</think>',
capture: bool = False,
)
| Method / property | Description |
|---|---|
.feed(token: str) -> str |
Process one token. Returns the text to emit (empty string when nothing ready yet). |
.flush() -> str |
Call once at end-of-stream. Returns any buffered visible text. Empty if stream ended inside a think block. |
.reset() -> None |
Reset to initial state. Use to process a second stream with the same instance. |
.think_content: str |
Accumulated think-block text. Non-empty only when capture=True. |
.in_think_block: bool |
True if the stream ended mid-think-block. Useful for diagnostics. |
| Constructor parameter | Type | Default | Description |
|---|---|---|---|
open_tag |
str |
<think> |
Opening tag to strip |
close_tag |
str |
</think> |
Closing tag to strip |
capture |
bool |
False |
Retain think content in .think_content instead of discarding |
Buffer sizes are derived automatically: len(open_tag) - 1 chars for the opening-tag guard,
len(close_tag) - 1 for the closing-tag guard. Custom tags carry no extra cost.
strip_think
Stateless helper for complete strings. Implemented via ThinkStrip — batch and streaming
behavior are identical.
strip_think(
text: str,
open_tag: str = '<think>',
close_tag: str = '</think>',
) -> str
strip_think_prefill
Removes a trailing open tag injected by some GGUF chat templates.
strip_think_prefill(
prompt: str,
open_tag: str = '<think>',
) -> str
Capture mode
When capture=True, think content accumulates in .think_content instead of being
discarded. Multiple think blocks per response are concatenated. Useful for surfacing
the model's reasoning in a separate UI panel or for eval runs.
stripper = ThinkStrip(capture=True)
for token in stream:
if emitted := stripper.feed(token):
yield emitted
if flushed := stripper.flush():
yield flushed
print(stripper.think_content) # full reasoning text
Limitations
- Nested tags are not supported. A
<think>that arrives while already inside a think block is treated as think content and swallowed. The first</think>closes the block; any subsequent</think>with no matching open tag passes through as visible text. In practice this does not occur — Qwen3 and DeepSeek-R1 emit exactly one think block per response.
Development
git clone https://github.com/informity/thinkstrip.git
cd thinkstrip
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
make lint
make test
make build
Contributing
See CONTRIBUTING.md.
License
MIT — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file thinkstrip-0.2.2.tar.gz.
File metadata
- Download URL: thinkstrip-0.2.2.tar.gz
- Upload date:
- Size: 12.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
21fdd8fcd346297d904acd4fdb9c98eb2577288f44542f4dac98a7b0c14c60fe
|
|
| MD5 |
09908cf3a8ce01294807334e1cba65a6
|
|
| BLAKE2b-256 |
26c2af715f7a8ea01a65befb23f417d2aebe0df873ef1044a54a52d5c8a14dc2
|
Provenance
The following attestation bundles were made for thinkstrip-0.2.2.tar.gz:
Publisher:
publish.yml on informity/thinkstrip
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
thinkstrip-0.2.2.tar.gz -
Subject digest:
21fdd8fcd346297d904acd4fdb9c98eb2577288f44542f4dac98a7b0c14c60fe - Sigstore transparency entry: 1186272633
- Sigstore integration time:
-
Permalink:
informity/thinkstrip@4bf64e64aeb3643ce4d8c1fc673d0d93dc68a9ae -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/informity
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4bf64e64aeb3643ce4d8c1fc673d0d93dc68a9ae -
Trigger Event:
push
-
Statement type:
File details
Details for the file thinkstrip-0.2.2-py3-none-any.whl.
File metadata
- Download URL: thinkstrip-0.2.2-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da584dab011a30d5e5088c9002afa096f5e08f1a815237e8f1054b4bdac8ba51
|
|
| MD5 |
7bcfaa0725f4487376c706762a1f6d1d
|
|
| BLAKE2b-256 |
a8de81fb323c60e7addc4a569ea7979066ec0d954e20ab25553a8533fbcd5f64
|
Provenance
The following attestation bundles were made for thinkstrip-0.2.2-py3-none-any.whl:
Publisher:
publish.yml on informity/thinkstrip
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
thinkstrip-0.2.2-py3-none-any.whl -
Subject digest:
da584dab011a30d5e5088c9002afa096f5e08f1a815237e8f1054b4bdac8ba51 - Sigstore transparency entry: 1186272670
- Sigstore integration time:
-
Permalink:
informity/thinkstrip@4bf64e64aeb3643ce4d8c1fc673d0d93dc68a9ae -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/informity
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4bf64e64aeb3643ce4d8c1fc673d0d93dc68a9ae -
Trigger Event:
push
-
Statement type: