Turn arbitrary text into guaranteed GSM-7 deliverable SMS — no UCS-2 fallback, deliverable on feature phones.
Project description
smssafe
Turn arbitrary text into guaranteed GSM-7 deliverable SMS — no UCS-2 fallback.
A single non-GSM-7 character (a smart quote, an em-dash, a stray emoji) forces an entire SMS into UCS-2 encoding: 70 characters per segment instead of 160, triple the cost, and silent delivery failure on feature/dumb phones that can't render it.
smssafe sanitizes text before you send it — transliterating, replacing, or
dropping every non-GSM-7 character while preserving as much meaning as possible.
- ✅ Zero dependencies — pure Python standard library, 3.10+
- ✅ Fully typed — ships
py.typed(PEP 561) - ✅ Homoglyph-aware — Cyrillic/Greek/full-width lookalikes are transliterated, not dropped
- ✅ Accurate segment & cost accounting (extended chars correctly counted as 2)
- ✅ Deterministic, side-effect-free, full audit trail of every change
Install
pip install smssafe
Usage
from smssafe import sanitize
result = sanitize("Hi — “there”, pay ~5,000 now… 🙂")
result.sanitized # 'Hi - "there", pay ~5,000 now...'
result.encoding # 'gsm7'
result.char_count # encoded length (extended chars count as 2)
result.segments # number of SMS parts
result.replacements # list[dict] — audit trail of every substitution
result.remaining_unsafe # list[str] — chars that could not be mapped (dropped)
result.is_clean # True if no changes were needed
What it handles
| Input | Output | Notes |
|---|---|---|
“ ” ‘ ’ smart quotes |
" ' |
Word / Google Docs / AI output |
– — … dash & ellipsis |
- ... |
|
• · ‣ bullets |
- |
|
Cyrillic аеор, Greek Α, full-width A1 |
aeop A A1 |
lookalikes transliterated, not dropped |
ÀÈÌÒÙ uppercase accent traps |
AEIOU |
not in GSM-7 despite lowercase being valid |
™ © ® |
TM (c) (R) |
|
₹ ₽ ₿ |
INR RUB BTC |
(€ £ $ ¥ are kept — they're valid GSM-7) |
| emoji, math-alphanumerics, non-BMP | (stripped) | recorded in remaining_unsafe |
| zero-width / BOM / exotic spaces | (stripped / normalized) |
Tilde:
~is a GSM-7 extended character (escape0x3D, costs 2 septets), so it passes through unchanged —~5,000stays~5,000rather than becoming a misleading-5,000. Non-ASCII tilde lookalikes (˜ ∼ ~) normalise to~.
Drop vs. replace unknowns
By default, characters with no safe mapping are dropped. Pass drop_unknown=False
to replace them with ? instead:
sanitize("A中B", drop_unknown=False).sanitized # 'A?B'
How it works
A deterministic 8-step pipeline (smssafe.core), each step independently testable:
- Strip non-BMP / surrogate codepoints (emoji, math-alphanumerics)
- Apply the homoglyph map (Cyrillic/Greek/full-width → Latin)
- Apply the explicit replacement map (quotes, dashes, currency, symbols…)
- NFD-normalize per character and strip diacritics for remaining accented chars
- Normalize whitespace (tabs, exotic/zero-width spaces)
- Collapse artifacts (runs of dashes/spaces)
- Final GSM-7 scan — drop or
?-replace anything left - Compute encoding, character count (extended = 2), and segment count
Development
git clone https://github.com/BRIQ-BLOCK/smssafe
cd smssafe
pip install -e ".[dev]"
pytest # 689 tests
ruff check .
mypy
License
MIT © Eddie Gulay
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smssafe-0.0.1.tar.gz.
File metadata
- Download URL: smssafe-0.0.1.tar.gz
- Upload date:
- Size: 34.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
951135c2f760b26bf70cca1dada7b7e9a964879dac6652a51ebaa9e45642fa32
|
|
| MD5 |
96d4e02abf5e8e45027ea00f1fe46f11
|
|
| BLAKE2b-256 |
96aa97f1a59287d4ee9a9f4b4887d2595034a95284328fca13862cd0dcc11a2c
|
File details
Details for the file smssafe-0.0.1-py3-none-any.whl.
File metadata
- Download URL: smssafe-0.0.1-py3-none-any.whl
- Upload date:
- Size: 20.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
746433ea1fa3a635396a35a905432d86891cab581d1983e13c6c9382ba471d85
|
|
| MD5 |
19151a1f7116faf4d8eec5a6ca2630b0
|
|
| BLAKE2b-256 |
0bc031484e768f4944437c5ec02487ab0ad134d0be41002b68b3877f6ec41e9e
|