Static Features Extraction Engine
Project description
MPH Static Features Extraction
This project allows the user to extract MalwPackHeat-like static features from Windows PE files, following the phases described in When Static Analysis Fail.
Prerequisites
-
Setup the PE malware directory such that they have the following structure:
<YOUR_PE_MALWARE_DIR>/ ├── <FAMILY_0>/ │ ├── SHA_0_0 │ ├── SHA_0_1 │ ├── ... │ └── ├── <FAMILY_1>/ │ ├── SHA_1_0 │ ├── ... │ └── ├── ... └──where
FAMILY_0, FAMILY_1, ...are the directories named with the malware family andSHA_0_0, SHA_0_1, ...are the PE files named with their SHA256. -
Run pre-feature selection train/test split, for example by using
train-test-splitsrepository -
Make sure to have a running and active version of Docker.
Usage
- Configure the Docker Compose file by providing the following information:
MALWARE_DIR_PATH: directory of YOUR_PE_MALWARE_DIRSPLITTED_DATASET_PATH: pre-feature selection train/test split directoryFINAL_DATASET_DIR: directory where to store the vectorized dataset given as outputN_PROCESSES: number of processors to use
- Start the extraction process:
docker compose up -d
Resource Considerations
Feature extraction on large PE datasets is highly memory-intensive.
While requirements depend on dataset size, users should be aware that the process can consume substantial system resources.
As a concrete example, processing a PE dataset (MALWARE_DIR_PATH) of approximately 177 GB required a machine equipped with 512 GB of RAM to complete extraction reliably.
For smaller datasets, proportionally less memory will be needed, but large-scale processing should be expected to require several hundred gigabytes of RAM.
Plan hardware capacity accordingly before launching the extraction process.
Authors
- Luca Fabri
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dts_cdd_wdis-1.4.2.tar.gz.
File metadata
- Download URL: dts_cdd_wdis-1.4.2.tar.gz
- Upload date:
- Size: 28.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
807b10be4c36c8ea838464d7178507b63332e27bc3449aa0be75aca504f91a0e
|
|
| MD5 |
e0f4dbdf0d34c59ef24085da108087c9
|
|
| BLAKE2b-256 |
24e48b8c4bc8e75531f2b7affa720c72a1c332f065f151b84328c0d6d7c9ecac
|
File details
Details for the file dts_cdd_wdis-1.4.2-py3-none-any.whl.
File metadata
- Download URL: dts_cdd_wdis-1.4.2-py3-none-any.whl
- Upload date:
- Size: 37.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
887afedc50dd223d48ac493c107cfab4943265e383b0b24b6a28adf6e2a0dc37
|
|
| MD5 |
0a1fb9d7917200857316ebde462b3d02
|
|
| BLAKE2b-256 |
d3d257b0c66468fd480ee008b077778ca452c5f654f6823bcc2457c3d78f0fb9
|