Skip to main content

Static Features Extraction Engine

Project description

MPH Static Features Extraction

This project allows the user to extract MalwPackHeat-like static features from Windows PE files, following the phases described in When Static Analysis Fail.

Prerequisites

  • Setup the PE malware directory such that they have the following structure:

        <YOUR_PE_MALWARE_DIR>/
        ├── <FAMILY_0>/
        │   ├── SHA_0_0
        │   ├── SHA_0_1
        │   ├── ...
        │   └──
        ├── <FAMILY_1>/
        │   ├── SHA_1_0
        │   ├── ...
        │   └──
        ├── ...
        └── 
    

    where FAMILY_0, FAMILY_1, ... are the directories named with the malware family and SHA_0_0, SHA_0_1, ... are the PE files named with their SHA256.

  • Run pre-feature selection train/test split, for example by using train-test-splits repository

  • Make sure to have a running and active version of Docker.

Usage

  • Configure the Docker Compose file by providing the following information:
    • MALWARE_DIR_PATH: directory of YOUR_PE_MALWARE_DIR
    • SPLITTED_DATASET_PATH: pre-feature selection train/test split directory
    • FINAL_DATASET_DIR: directory where to store the vectorized dataset given as output
    • N_PROCESSES: number of processors to use
  • Start the extraction process:
    docker compose up -d
    

Resource Considerations

Feature extraction on large PE datasets is highly memory-intensive.
While requirements depend on dataset size, users should be aware that the process can consume substantial system resources.

As a concrete example, processing a PE dataset (MALWARE_DIR_PATH) of approximately 177 GB required a machine equipped with 512 GB of RAM to complete extraction reliably.
For smaller datasets, proportionally less memory will be needed, but large-scale processing should be expected to require several hundred gigabytes of RAM.

Plan hardware capacity accordingly before launching the extraction process.

Authors

  • Luca Fabri

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dts_cdd_wdis-1.4.2.tar.gz (28.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dts_cdd_wdis-1.4.2-py3-none-any.whl (37.5 kB view details)

Uploaded Python 3

File details

Details for the file dts_cdd_wdis-1.4.2.tar.gz.

File metadata

  • Download URL: dts_cdd_wdis-1.4.2.tar.gz
  • Upload date:
  • Size: 28.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for dts_cdd_wdis-1.4.2.tar.gz
Algorithm Hash digest
SHA256 807b10be4c36c8ea838464d7178507b63332e27bc3449aa0be75aca504f91a0e
MD5 e0f4dbdf0d34c59ef24085da108087c9
BLAKE2b-256 24e48b8c4bc8e75531f2b7affa720c72a1c332f065f151b84328c0d6d7c9ecac

See more details on using hashes here.

File details

Details for the file dts_cdd_wdis-1.4.2-py3-none-any.whl.

File metadata

  • Download URL: dts_cdd_wdis-1.4.2-py3-none-any.whl
  • Upload date:
  • Size: 37.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for dts_cdd_wdis-1.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 887afedc50dd223d48ac493c107cfab4943265e383b0b24b6a28adf6e2a0dc37
MD5 0a1fb9d7917200857316ebde462b3d02
BLAKE2b-256 d3d257b0c66468fd480ee008b077778ca452c5f654f6823bcc2457c3d78f0fb9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page