Skip to main content

Python module for detecting password, api keys hashes and any other string that resembles a randomly generated character sequence.

Project description

stringlifier

String-classifier - is a python module for detecting random string and hashes text/code.

Typical usage scenarios include:

  • Sanitizing application or security logs
  • Detecting accidentally exposed credentials (complex passwords or api keys)

Interactive notebook

You can see Stringlifier in action by checking out this interactive notebook hosted on Colaboratory.

Quick start guide

You can quickly use stringlifier via pip-installation:

$ pip install stringlifier

API example:

from stringlifier.api import Stringlifier

stringlifier=Stringlifier()

s = stringlifier("com.docker.hyperkit -A -u -F vms/0/hyperkit.pid -c 8 -m 8192M -b 127.0.0.1 --pass=\"NlcXVpYWRvcg\" -s 0:0,hostbridge -s 31,lpc -s 1:0,virtio-vpnkit,path=vpnkit.eth.sock,uuid=45172425-08d1-41ec-9d13-437481803412 -U c6fb5010-a83e-4f74-9a5a-50d9086b9")

After this, s should be:

'/System/Library/DriverExtensions/AppleUserHIDDrivers.dext/AppleUserHIDDrivers com.apple.driverkit.AppleUserUSBHostHIDDevice0 <RANDOM_STRING>'

You can also choose to see the full tokenization and classification output:

s, tokens = stringlifier("com.docker.hyperkit -A -u -F vms/0/hyperkit.pid -c 8 -m 8192M -b 127.0.0.1 --pass=\"NlcXVpYWRvcg\" -s 0:0,hostbridge -s 31,lpc -s 1:0,virtio-vpnkit,path=vpnkit.eth.sock,uuid=45172425-08d1-41ec-9d13-437481803412 -U c6fb5010-a83e-4f74-9a5a-50d9086b9", return_tokens=True)

s will be the same as before and tokens will contain the following data:

[[('0', 33, 34, '<NUMERIC>'),
   ('8', 51, 52, '<NUMERIC>'),
   ('8192', 56, 60, '<NUMERIC>'),
   ('127.0.0.1', 65, 74, '<IP_ADDR>'),
   ('NlcXVpYWRvcg', 83, 95, '<RANDOM_STRING>'),
   ('0', 100, 101, '<NUMERIC>'),
   ('0', 102, 103, '<NUMERIC>'),
   ('31', 118, 120, '<NUMERIC>'),
   ('1', 128, 129, '<NUMERIC>'),
   ('0', 130, 131, '<NUMERIC>'),
   ('45172425-08d1-41ec-9d13-437481803412', 172, 208, '<UUID>'),
   ('c6fb5010-a83e-4f74-9a5a-50d9086b9', 212, 244, '<UUID>')]]

Building your own classifier

You can also train your own model if you want to detect different types of strings. For this you can use the Command Line Interface for the string classifier:

$ python3 stringlifier/modules/stringc.py --help

Usage: stringc.py [options]

Options:
  -h, --help            show this help message and exit
  --interactive
  --train
  --resume
  --train-file=TRAIN_FILE
  --dev-file=DEV_FILE
  --store=OUTPUT_BASE
  --patience=PATIENCE   (default=20)
  --batch-size=BATCH_SIZE
                        (default=32)
  --device=DEVICE

For instructions on how to generate your training data, use this link.

Important note: This model might not scale if detecting a type of string depends on the surrounding tokens. In this case, you can look at a more advanced tool for sequence processing such as NLP-Cube

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stringlifier-0.1.0.7.tar.gz (2.8 MB view details)

Uploaded Source

Built Distribution

stringlifier-0.1.0.7-py3-none-any.whl (2.8 MB view details)

Uploaded Python 3

File details

Details for the file stringlifier-0.1.0.7.tar.gz.

File metadata

  • Download URL: stringlifier-0.1.0.7.tar.gz
  • Upload date:
  • Size: 2.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.7

File hashes

Hashes for stringlifier-0.1.0.7.tar.gz
Algorithm Hash digest
SHA256 4f6b82452c5725e35f24cfc0b7597667ee25bfd191b4e5e62accf593712f785f
MD5 3927db0e7101d3ac2edb332b155ad83d
BLAKE2b-256 acd56b0b0ddb7caf5e8cc2f1bbc2dff208e94a261dfde8c3c0799f525a6d3b8f

See more details on using hashes here.

File details

Details for the file stringlifier-0.1.0.7-py3-none-any.whl.

File metadata

  • Download URL: stringlifier-0.1.0.7-py3-none-any.whl
  • Upload date:
  • Size: 2.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.7

File hashes

Hashes for stringlifier-0.1.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 f7d71f76e13b0c02d1deb96dbdeab4db70233c636f751e9afa964476ea4d694d
MD5 57da1268e29abcf3f5d940b80a759ea8
BLAKE2b-256 af8660b0a2021873921bce27fab5c7a22f17f4a05a902538f784b8d48ab9fb9a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page