Skip to main content

GenRex is a tool that generates regular expressions from strings such as artifacts dynamically generated by samples.

Project description

GenRex 🦖

PyPI

GenRex is a tool that generates regular expressions from strings such as artifacts dynamically generated by samples.

For more information, check out:

Citation

For citing, please use the following entry for the original paper:

@misc{genrex-2023,
  doi = {10.1109/TrustCom60117.2023.00123},
  url = {...},
  author = {Regeciova, Dominika and Kolar, Dusan},
  title = {GenRex: Leveraging Regular Expressions for Dynamic Malware Detection},
  publisher = {IEEE},
  year = {2023}
}

:snake: Minimal supported version of Python is 3.10.

Installation

pip install -U genrex-py

How to Use

import genrex

# pre-process the input from cuckoo_format reports (default: False)
cuckoo_format = False

# When True, GenRex will store the original strings as well,
# when False, GenRex will store the processed strings only
store_original_strings = True

results = genrex.generate(cuckoo_format=cuckoo_format, store_original_strings=store_original_strings, directory="samples")

print("Results:")

for result in results:
    print(result)


"""
Results:
Regex: hello[0-9a-f]                               # regular expression
Ngram: hell                                        # common part of strings in cluster
Unique: 6                                          # how many unique strings are in cluster
Min: 3                                             # minimal number of occurrences in samples
Max: 3                                             # maximal number of occurrences in samples
Average: 3.0                                       # average number of occurrences in samples
Resources: ['hello1', 'hello2', ..., 'helloc']     # list of preprocessed strings from cluster
Originals: []                                      # list of original strings from cluster
                                                   # (if store_original_strings is True)
Input type: ''                                     # input types (if defined)
Hashes: ['source1', 'source2']                     # list of sources

['input_type', 'ngram', 'original_regexes', 'originals', 'regex', 'similar_regex', 'similars', 'unique']
"""    
import genrex

cuckoo_format = True

# When True, GenRex will store the original strings as well,
# when False, GenRex will store the processed strings only
store_original_strings = False

results = genrex.generate(
    cuckoo_format=cuckoo_format,
    store_original_strings=store_original_strings,
    input_type=genrex.InputType.MUTEX,
    source={
        "source1": ["helloa", "hellob", "helloc"],
        "source2": ["hello1", "hello2", "hello3"],
    }
)

print("Results:")

for result in results:
    res = result.return_printable_dict()
    print("Regex:", res["regex"])            # regular expression
    print("Ngram:", res["ngram"])            # common part of strings in cluster
    print("Unique:", res["unique"])          # how many unique strings are in cluster
    print("Min:", res["min"])                # minimal number of occurrences in samples
    print("Max:", res["max"])                # maximal number of occurrences in samples
    print("Avg:", res["average"])            # average number of occurrences in samples
    print("Resources:", res["resources"])    # list of preprocessed strings from cluster
    print("Type:", res["input_type"])        # input types (if defined)
    print("Hashes:", res["hashes"])          # list of sources

"""
Results:
Regex: (^|\\)hello[0-9a-f]$
Ngram: hell
Unique: 6
Min: 3
Max: 3
Avg: 3.0
Type: 'mutex'
Resources: ['helloc', 'hellob', 'hello3', 'hello1', 'helloa', 'hello2']
Hashes: ['source1', 'source2']
"""

How to develop

Install GenRex in development mode with all necessary dependencies.

make setup

Tests

You can run tests with the following command:

make tests

License

Copyright (c) 2023 Avast Software, licensed under the MIT license. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genrex_py-1.0.0.tar.gz (22.4 kB view hashes)

Uploaded Source

Built Distribution

genrex_py-1.0.0-py3-none-any.whl (23.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page