Skip to main content

Python module for detecting password, api keys hashes and any other string that resembles a randomly generated character sequence.

Project description


String-classifier - is a python module for detecting random string and hashes text/code.

Typical usage scenarios include:

  • Sanitizing application or security logs
  • Detecting accidentally exposed credentials (complex passwords or api keys)

Quick start guide

You can quickly use stringlifier via pip-installation:

$ pip install stringlifier

API example:

from stringlifier.api import Stringlifier


s = stringlifier('/System/Library/DriverExtensions/AppleUserHIDDrivers.dext/AppleUserHIDDrivers 0x10000992d')

After this, s should be:

'/System/Library/DriverExtensions/AppleUserHIDDrivers.dext/AppleUserHIDDrivers <RANDOM_STRING>'

You can also choose to see the full tokenization and classification output:

s, tokens = stringlifier('/System/Library/DriverExtensions/AppleUserHIDDrivers.dext/AppleUserHIDDrivers 0x10000992d', return_tokens=True)

s will be the same as before and tokens will contain the following data:

[{'token': '/', 'type': 'SYMBOL'},
 {'token': 'System', 'type': 'STRING'},
 {'token': '/', 'type': 'SYMBOL'},
 {'token': 'Library', 'type': 'STRING'},
 {'token': '/', 'type': 'SYMBOL'},
 {'token': 'DriverExtensions', 'type': 'STRING'},
 {'token': '/', 'type': 'SYMBOL'},
 {'token': 'AppleUserHIDDrivers', 'type': 'STRING'},
 {'token': '.', 'type': 'SYMBOL'},
 {'token': 'dext', 'type': 'STRING'},
 {'token': '/', 'type': 'SYMBOL'},
 {'token': 'AppleUserHIDDrivers', 'type': 'STRING'},
 {'token': ' ', 'type': 'SYMBOL'},
 {'token': 'com', 'type': 'STRING'},
 {'token': '.', 'type': 'SYMBOL'},
 {'token': 'apple', 'type': 'STRING'},
 {'token': '.', 'type': 'SYMBOL'},
 {'token': 'driverkit', 'type': 'STRING'},
 {'token': '.', 'type': 'SYMBOL'},
 {'token': 'AppleUserUSBHostHIDDevice0', 'type': 'STRING'},
 {'token': ' ', 'type': 'SYMBOL'},
 {'token': '0x10000992d', 'type': 'HASH'}]

Building your own classifier

You can also train your own model if you want to detect different types of strings. For this you can use the Command Line Interface for the string classifier:

$ python3 stringlifier/modules/ --help

Usage: [options]

  -h, --help            show this help message and exit
  --patience=PATIENCE   (default=20)

For instructions on how to generate your training data, use this link.

Important note: This model might not scale if detecting a type of string depends on the surrounding tokens. In this case, you can look at a more advanced tool for sequence processing such as NLP-Cube

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stringlifier- (822.4 kB view hashes)

Uploaded Source

Built Distribution

stringlifier- (825.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page