CLI tool to strip ChatGPT-specific markers from text

These details have not been verified by PyPI

Project links

Homepage

Project description

stripgpt

CLI (and tiny library) to scrub ChatGPT / LLM conversation artifacts from text files or streams.

It removes:

Private Use Area span markers used by ChatGPT export (U+E200 / U+E201) and the text inside them
Any remaining private–use characters (Unicode category Co)
Zero‑width & directionality control characters (ZWSP, ZWNJ, ZWJ, LRM, RLM, LRE, RLE, PDF, LRO, RLO, WJ, LRI, RLI, FSI, PDI)
(Optional) "bare" leftover tokens like turn2search5 and line range snippets L10-L42
(Optional) Normalizes whitespace (collapses runs of spaces / tabs, removes trailing space, trims ends)

Why?

Copying / exporting LLM answers often smuggles in hidden marker & control characters that pollute diffs and source control. stripgpt makes cleaning them automatic and scriptable.

Features

Stream or file mode (stdin→stdout or specified files)
In‑place editing with optional backup suffix
Conservative defaults (whitespace normalized unless --no-normalize)
Optional removal of leftover token artifacts
Simple Python API: from stripgpt import clean_text
Tested on Python 3.12 (minimum supported)
CI workflow already configured (GitHub Actions)

Installation

Editable (development) install:

python3.12 -m venv .venv
source .venv/bin/activate
pip install -e .[dev]

Once published to PyPI:

pip install stripgpt

Command Line Usage

Read from stdin / write to stdout:

pbpaste | stripgpt | pbcopy

Clean one or more files (output to stdout):

stripgpt session.md > clean.md
stripgpt file1.txt file2.txt > merged-clean.txt

In place (overwrite):

stripgpt -i session.md

In place with backup:

stripgpt -i --backup-suffix .bak session.md

Remove bare tokens & line ranges too:

stripgpt --kill-bare transcript.txt > scrubbed.txt

Preserve original whitespace:

stripgpt --no-normalize notes.txt > cleaned.txt

Specify encoding (default utf-8):

stripgpt --encoding latin-1 legacy.txt > legacy-clean.txt

Detection only (no modification) – JSON report per input:

stripgpt --detect file1.txt file2.txt
# or
cat text.md | stripgpt --detect

Example output:

{"pua_spans":1,"bare_tokens":2,"zero_width":3,"file":"file1.txt"}

Help:

stripgpt -h

Exit Codes

Code	Meaning
0	Success
1	Unhandled / runtime error (message on stderr)

Library API

from stripgpt import clean_text

cleaned = clean_text(text, kill_bare=True, normalize=True)

Signature:

clean_text(txt: str, *, kill_bare: bool, normalize: bool) -> str

Parameters:

kill_bare: remove tokens like turn12search5 and ranges L10-L20
normalize: collapse repeated spaces / tabs, strip trailing & leading whitespace

How It Works

Remove any span starting with U+E200 and ending with U+E201 (non-greedy), including enclosed text
Strip any remaining private-use characters (category Co)
Remove zero-width & bidi control characters
Optionally remove bare token artifacts & line ranges
Optionally normalize whitespace

All regexes compiled at import; performance is I/O bound for typical file sizes.

Development

Requires Python 3.12.

python3.12 -m venv .venv
source .venv/bin/activate
pip install -e .[dev]
pytest -q

Or via tox:

tox

Publishing (manual)

Requires build and twine (install via pip install build twine).

python -m build
twine check dist/*
twine upload dist/*  # set PYPI_TOKEN or enter credentials

Or use the provided GitHub Actions workflow (add PYPI_API_TOKEN secret).

Run CLI locally without install (editable already works):

python -m stripgpt --help

Continuous Integration

GitHub Actions workflow (.github/workflows/ci.yml) runs tests on Python 3.12.

Suggested Enhancements

Streaming (line-by-line) processing to reduce memory
Coverage & badge
Pre-commit hook config
Removal statistics / summary report
Additional token pattern detection

Troubleshooting

Issue	Hint
File unchanged	Use `-i` for in-place or redirect stdout to a file
Hidden chars remain	Inspect with `hexdump -C` or a Unicode viewer; open an issue with samples
Encoding errors	Pass `--encoding` matching the source file
"No tests ran" in CI	Ensure `tests/` present & `pytest.ini` unchanged

Safety

Use --backup-suffix during first runs for peace of mind.

License

MIT License. See LICENSE file.

Acknowledgements

Inspired by persistent invisible marker annoyances in exported ChatGPT conversations.

Happy clean diffs!

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.0

Aug 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stripgpt-0.2.0.tar.gz (8.5 kB view details)

Uploaded Aug 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

stripgpt-0.2.0-py3-none-any.whl (7.8 kB view details)

Uploaded Aug 19, 2025 Python 3

File details

Details for the file stripgpt-0.2.0.tar.gz.

File metadata

Download URL: stripgpt-0.2.0.tar.gz
Upload date: Aug 19, 2025
Size: 8.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for stripgpt-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`3b250fd859e3dd6dff3ddd89a47138fd0730149b052dc6c7dae4e9b8f2c5777c`
MD5	`68783dd1af78660b4b5fea3c4eb0198f`
BLAKE2b-256	`50611cd59a12e9f27e2f1285da376a1ff602054aabb272fc4567f0fb778604d0`

See more details on using hashes here.

File details

Details for the file stripgpt-0.2.0-py3-none-any.whl.

File metadata

Download URL: stripgpt-0.2.0-py3-none-any.whl
Upload date: Aug 19, 2025
Size: 7.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for stripgpt-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b68c36b404c3ed73f5f9f98b7ec311761fecf058e6650417ce850dbe34a1d86d`
MD5	`525f10cea8faa78ff807535ae2365256`
BLAKE2b-256	`46ad0662a44e9133c96bc500996da64fa95359c63e53d4557e4774c35dc6ec4c`

See more details on using hashes here.

stripgpt 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

stripgpt

Why?

Features

Installation

Command Line Usage

Exit Codes

Library API

How It Works

Development

Publishing (manual)

Continuous Integration

Suggested Enhancements

Troubleshooting

Safety

License

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes