Skip to main content

Turn arbitrary text into guaranteed GSM-7 deliverable SMS — no UCS-2 fallback, deliverable on feature phones.

Project description

smssafe

CI PyPI Python License: MIT

Turn arbitrary text into guaranteed GSM-7 deliverable SMS — no UCS-2 fallback.

A single non-GSM-7 character (a smart quote, an em-dash, a stray emoji) forces an entire SMS into UCS-2 encoding: 70 characters per segment instead of 160, triple the cost, and silent delivery failure on feature/dumb phones that can't render it.

smssafe sanitizes text before you send it — transliterating, replacing, or dropping every non-GSM-7 character while preserving as much meaning as possible.

  • Zero dependencies — pure Python standard library, 3.10+
  • Fully typed — ships py.typed (PEP 561)
  • ✅ Homoglyph-aware — Cyrillic/Greek/full-width lookalikes are transliterated, not dropped
  • ✅ Accurate segment & cost accounting (extended chars correctly counted as 2)
  • ✅ Deterministic, side-effect-free, full audit trail of every change

Install

pip install smssafe

Usage

from smssafe import sanitize

result = sanitize("Hi — “there”, pay ~5,000 now… 🙂")

result.sanitized          # 'Hi - "there", pay ~5,000 now...'
result.encoding           # 'gsm7'
result.char_count         # encoded length (extended chars count as 2)
result.segments           # number of SMS parts
result.replacements       # list[dict] — audit trail of every substitution
result.remaining_unsafe   # list[str] — chars that could not be mapped (dropped)
result.is_clean           # True if no changes were needed

What it handles

Input Output Notes
“ ” ‘ ’ smart quotes " ' Word / Google Docs / AI output
– — … dash & ellipsis - ...
• · ‣ bullets -
Cyrillic аеор, Greek Α, full-width A1 aeop A A1 lookalikes transliterated, not dropped
ÀÈÌÒÙ uppercase accent traps AEIOU not in GSM-7 despite lowercase being valid
™ © ® TM (c) (R)
₹ ₽ ₿ INR RUB BTC (€ £ $ ¥ are kept — they're valid GSM-7)
emoji, math-alphanumerics, non-BMP (stripped) recorded in remaining_unsafe
zero-width / BOM / exotic spaces (stripped / normalized)

Tilde: ~ is a GSM-7 extended character (escape 0x3D, costs 2 septets), so it passes through unchanged — ~5,000 stays ~5,000 rather than becoming a misleading -5,000. Non-ASCII tilde lookalikes (˜ ∼ ~) normalise to ~.

Drop vs. replace unknowns

By default, characters with no safe mapping are dropped. Pass drop_unknown=False to replace them with ? instead:

sanitize("A中B", drop_unknown=False).sanitized   # 'A?B'

How it works

A deterministic 8-step pipeline (smssafe.core), each step independently testable:

  1. Strip non-BMP / surrogate codepoints (emoji, math-alphanumerics)
  2. Apply the homoglyph map (Cyrillic/Greek/full-width → Latin)
  3. Apply the explicit replacement map (quotes, dashes, currency, symbols…)
  4. NFD-normalize per character and strip diacritics for remaining accented chars
  5. Normalize whitespace (tabs, exotic/zero-width spaces)
  6. Collapse artifacts (runs of dashes/spaces)
  7. Final GSM-7 scan — drop or ?-replace anything left
  8. Compute encoding, character count (extended = 2), and segment count

Development

git clone https://github.com/BRIQ-BLOCK/smssafe
cd smssafe
pip install -e ".[dev]"
pytest            # 689 tests
ruff check .
mypy

License

MIT © Eddie Gulay

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smssafe-0.0.1.tar.gz (34.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smssafe-0.0.1-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file smssafe-0.0.1.tar.gz.

File metadata

  • Download URL: smssafe-0.0.1.tar.gz
  • Upload date:
  • Size: 34.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for smssafe-0.0.1.tar.gz
Algorithm Hash digest
SHA256 951135c2f760b26bf70cca1dada7b7e9a964879dac6652a51ebaa9e45642fa32
MD5 96d4e02abf5e8e45027ea00f1fe46f11
BLAKE2b-256 96aa97f1a59287d4ee9a9f4b4887d2595034a95284328fca13862cd0dcc11a2c

See more details on using hashes here.

File details

Details for the file smssafe-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: smssafe-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for smssafe-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 746433ea1fa3a635396a35a905432d86891cab581d1983e13c6c9382ba471d85
MD5 19151a1f7116faf4d8eec5a6ca2630b0
BLAKE2b-256 0bc031484e768f4944437c5ec02487ab0ad134d0be41002b68b3877f6ec41e9e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page