Skip to main content

Static Features Extraction Engine

Project description

MPH Static Features Extraction

This project allows the user to extract MalwPackHeat-like static features from Windows PE files, following the phases described in When Static Analysis Fail.

Prerequisites

  • Setup the PE malware directory such that they have the following structure:

        <YOUR_PE_MALWARE_DIR>/
        ├── <FAMILY_0>/
        │   ├── SHA_0_0
        │   ├── SHA_0_1
        │   ├── ...
        │   └──
        ├── <FAMILY_1>/
        │   ├── SHA_1_0
        │   ├── ...
        │   └──
        ├── ...
        └── 
    

    where FAMILY_0, FAMILY_1, ... are the directories named with the malware family and SHA_0_0, SHA_0_1, ... are the PE files named with their SHA256.

  • Run pre-feature selection train/test split, for example by using train-test-splits repository

  • Make sure to have a running and active version of Docker.

Usage

  • Configure the Docker Compose file by providing the following information:
    • MALWARE_DIR_PATH: directory of YOUR_PE_MALWARE_DIR
    • SPLITTED_DATASET_PATH: pre-feature selection train/test split directory
    • FINAL_DATASET_DIR: directory where to store the vectorized dataset given as output
    • N_PROCESSES: number of processors to use
  • Start the extraction process:
    docker compose up -d
    

Resource Considerations

This project does not enforce strict hardware requirements. However, users should be aware that PE feature extraction can be highly memory-intensive, especially when working with large datasets.

As a practical reference, processing a PE dataset (MALWARE_DIR_PATH) of approximately 177 GB required a machine with 512 GB of RAM to ensure stable performance and avoid memory pressure. Smaller datasets will generally require less, but hardware should be planned accordingly.

Authors

  • Luca Fabri

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dts_cdd_wdis-1.4.1.tar.gz (28.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dts_cdd_wdis-1.4.1-py3-none-any.whl (37.4 kB view details)

Uploaded Python 3

File details

Details for the file dts_cdd_wdis-1.4.1.tar.gz.

File metadata

  • Download URL: dts_cdd_wdis-1.4.1.tar.gz
  • Upload date:
  • Size: 28.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for dts_cdd_wdis-1.4.1.tar.gz
Algorithm Hash digest
SHA256 648b1ba04b1efb39093838cede07d18030748e88716e709e8865ac35394b4933
MD5 c469e442349ff80a9703e3aded2a2db0
BLAKE2b-256 103e7fb09ab047149675bfd706e50867f70cefc9f49d2e48ec0e13a7518216c9

See more details on using hashes here.

File details

Details for the file dts_cdd_wdis-1.4.1-py3-none-any.whl.

File metadata

  • Download URL: dts_cdd_wdis-1.4.1-py3-none-any.whl
  • Upload date:
  • Size: 37.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for dts_cdd_wdis-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c75653b7e85797d58859076957e5ec4614ff54e0c18522167546d5917c8041b5
MD5 7b76631bd42e7ec58e9f85892058f8ba
BLAKE2b-256 c7c06256b9925cf75397f1552dd4936c6fd7b21478fbefdad314668cf8f7eb38

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page