Skip to main content

Experimental unicode steganography package

Project description

pyUnicodeSteganography

Experimental python package implementing several methods of unicode steganography (concealing a message within text by sneaky use of the unicode character set).

Build/Install

  • Clone this repository

  • Build: python3 -m build or python3 setup.py build

  • Install: python3 -m pip install . or python3 setup.py install

Test

python3 setup.py test

Using

import pyUnicodeSteganography as usteg

text = "this is a completely normal message"
secret_msg = "attack at dawn"
encoded = usteg.encode(text, secret_msg)
decoded = usteg.decode(encoded)

Steganography Methods

Zero Width Characters

Unicode standard includes multiple non-printing or zerowidth characters such as '\u200b' zero width space. Which are not visible when rendered by most browsers/editors and etc. These can be used to invisibly embed arbitrary data inside of other text. This is the default method this package uses for steganography.

secret_text = usteg.encode("some text", "data")
secret_binary = usteg.encode("some text", b'\x00\x01', binary=True)

SNOW (steganographic nature of whitespace)

Original project site

An older method which uses trailing whitespace to embed arbitrary data in text. SNOW takes advantage of the fact that many browsers and other programs will retain but not display trailing whitespace to embed arbitrary data. Can work even with plain ascii text and may function better in situations where special unicode characters are removed.

secret_text = usteg.encode("some text", "data", method="snow")
secret_binary = usteg.encode("some text", b'\x00\x01', method="snow", binary=True)

Unicode Lookalikes

Unicode includes a lot of characters which are confusablewith other characters. This can also be used to encode arbitrary data into a string. This method strategically replaces characters in a string with lookalike characters creating a simple binary encoding. (normal char-0, lookalike-1)

secret_text = usteg.encode("some text", "a", method="lookalike")
secret_binary = usteg.encode("some text", b'\x00', method="lookalike", binary=True)

Platform Example Twitter

twitter_encoded = stego.encode("hello friend this is a perfectly normal conversation", "attack at dawn", replacements="\u200b\u200c\u200d\u2060")
twitter_decoded = stego.decode(twitter_encoded, replacements="\u200b\u200c\u200d\u2060")

Different platforms have different rules for dealing with zero width chars and other unicode nonsense. Twitter for example removes some of the characters we use in our defaults. Some experiments show we can use a different set successfully. When trying to send messages on a new platform play around and see what characters are allowed if the defaults don't work.

Data Capacity

SNOW method for steganography works by inserting new characters at the end of a string and so can encode any amount of data at the cost of increasing the size of the text. Zero width and lookalike steganography require a certain amount of text for data to be encoded. Many platforms limit the number of consecutive zero width characters in a text, to evade this our default zw encoding splits the zw characters into groups of length 4 and inserts them between printable characters. This gives us roughly 1 byte of encoded data per printable character. For lookalikes the rough formula is 1 byte per 8 substitutable chars.

from pyUnicodeSteganography.lookalikes import capacity 
my_string = "hello I am a string I have nothing to fear because I have nothing to hide"
byte_capacity = capacity(my_string)

Data Corruption

Our zero width encoding method has a couple of limitiations. It cannot properly handle unicode strings which contain any of the characters it uses as lookalikes. Encoding data into strings which contain these will corrupt the data unpredictably.

Padded Output

The lookalikes method also cannot determine where the 'encoded' portion of a string ends if you encode data into a string with more 'capacity' than you use. The returned bytes/string will instead be null padded up to the total capacity of the string you decode. Keep this in mind if encoding binary data.

Using Custom Character Sets

Package includes defaults for each method for their character set and delimiters (zerowidth and snow) or substitution table (lookalikes). The defaults are generally reasonable but there are a lot of cases where you may want to change them. Certain zerowidth characters are stripped/blocked on a website, lookalikes for a different language and etc. This can be done by passing a list of chars to the named arguments "replacements" and "delimiter" for zerowidth/snow. Or by passing a dictionary of chars and their lookalikes to "replacements" for lookalikes.

character_set = ["\u200B", "\u200C", "\u200E", "\u0000"]
delimiter = "\u2062"
secret_text = usteg.encode("some text", "secret", replacements=character_set, delimiter=delimiter)
secret = usteg.decode(secret_text, replacements=character_set, delimiter=delimiter)
substitution_table = {'A':'\u0391', 'B':'\u0392', 'C':'\u03F9'}
secret_text = usteg.encode("ABC ABC ABC ABC ABC ABC", "a", method="lookalike", replacements=substitution_table)
secret = usteg.decode(secret_text, method="lookalike", replacements=substitution_table)

Limitations

Currently zero width only supports a 2 bit encoding method and requires 4 char character set. You may include more but only the first 4 will be used in encoding. Snow only supports a 1 bit encoding and requires 2 chars in its character set. For both snow and zerowidth the delimiter string must be different than the characters used in the character set.

Encryption

This package does not include any support for encryption but it's easy enough to send encrypted messages using unicode steganography. The following is a simple example of how to do so with a well supported cryptography library.

import pyUnicodeSteganography as usteg
import nacl.secret
import nacl.utils 

# create secret key and initialize secret key encryption method
key = nacl.utils.random(nacl.secret.SecretBox.KEY_SIZE)
box = nacl.secret.SecretBox(key)

# encrypt message and use unicode steganography to hide binary data in text
message = b'encrypted secret'
ciphertext = box.encrypt(message)
encoded_ciphertext = usteg.encode("hello friend", ciphertext, binary=True)

# extract encoded message and decrypt 
ciphertext = usteg.decode(encoded_ciphertext, binary=True)
plaintext = box.decrypt(ciphertext)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyUnicodeSteganography-0.0.1.tar.gz (7.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyUnicodeSteganography-0.0.1-py3.7.egg (14.9 kB view details)

Uploaded Egg

pyUnicodeSteganography-0.0.1-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file pyUnicodeSteganography-0.0.1.tar.gz.

File metadata

  • Download URL: pyUnicodeSteganography-0.0.1.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.7.3

File hashes

Hashes for pyUnicodeSteganography-0.0.1.tar.gz
Algorithm Hash digest
SHA256 354869464d2228e12e44dca5a6c2cc8848c9b32719a1f3b1e2d29b0b74087e83
MD5 d4f8199eb401f480d82f5f4969c62976
BLAKE2b-256 cec36e6455d8b20d72f607da9a9e940ae6362579905f6c6ed7fc30d2c4354521

See more details on using hashes here.

File details

Details for the file pyUnicodeSteganography-0.0.1-py3.7.egg.

File metadata

  • Download URL: pyUnicodeSteganography-0.0.1-py3.7.egg
  • Upload date:
  • Size: 14.9 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.7.3

File hashes

Hashes for pyUnicodeSteganography-0.0.1-py3.7.egg
Algorithm Hash digest
SHA256 2695d853ecba4ae76196d4af12b330fd3810abdb043cf9f07b2b00b157909690
MD5 8a39f5ab67bb6b42616dda034ecebe80
BLAKE2b-256 5f028234253df9e667bf25bdb056876cc0b384e6b53ec9a9b2291839bf42cc8b

See more details on using hashes here.

File details

Details for the file pyUnicodeSteganography-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: pyUnicodeSteganography-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.6.3 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.7.3

File hashes

Hashes for pyUnicodeSteganography-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 226285fceb9ecd6e860b12b8570b9302cbbe83f1639f572a2bb7d50611c5a290
MD5 ad7ba364eefca69097d8bfdeb550c5c3
BLAKE2b-256 53f9e2120595c17070e647712ddcd6919f702082d727cb86bb83ada01638c5b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page