Static Features Extraction Engine
Project description
MPH Static Features Extraction
This project allows the user to extract MalwPackHeat-like static features from Windows PE files, following the phases described in When Static Analysis Fail.
Prerequisites
-
Setup the PE malware directory such that they have the following structure:
<YOUR_PE_MALWARE_DIR>/ ├── <FAMILY_0>/ │ ├── SHA_0_0 │ ├── SHA_0_1 │ ├── ... │ └── ├── <FAMILY_1>/ │ ├── SHA_1_0 │ ├── ... │ └── ├── ... └──where
FAMILY_0, FAMILY_1, ...are the directories named with the malware family andSHA_0_0, SHA_0_1, ...are the PE files named with their SHA256. -
Run pre-feature selection train/test split, for example by using
train-test-splitsrepository -
Make sure to have a running and active version of Docker.
Usage
- Configure the Docker Compose file by providing the following information:
MALWARE_DIR_PATH: directory of YOUR_PE_MALWARE_DIRSPLITTED_DATASET_PATH: pre-feature selection train/test split directoryFINAL_DATASET_DIR: directory where to store the vectorized dataset given as outputN_PROCESSES: number of processors to use
- Start the extraction process:
docker compose up -d
Resource Considerations
This project does not enforce strict hardware requirements. However, users should be aware that PE feature extraction can be highly memory-intensive, especially when working with large datasets.
As a practical reference, processing a PE dataset (MALWARE_DIR_PATH) of approximately 177 GB required a machine with 512 GB of RAM to ensure stable performance and avoid memory pressure. Smaller datasets will generally require less, but hardware should be planned accordingly.
Authors
- Luca Fabri
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dts_cdd_wdis-1.4.1.tar.gz.
File metadata
- Download URL: dts_cdd_wdis-1.4.1.tar.gz
- Upload date:
- Size: 28.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
648b1ba04b1efb39093838cede07d18030748e88716e709e8865ac35394b4933
|
|
| MD5 |
c469e442349ff80a9703e3aded2a2db0
|
|
| BLAKE2b-256 |
103e7fb09ab047149675bfd706e50867f70cefc9f49d2e48ec0e13a7518216c9
|
File details
Details for the file dts_cdd_wdis-1.4.1-py3-none-any.whl.
File metadata
- Download URL: dts_cdd_wdis-1.4.1-py3-none-any.whl
- Upload date:
- Size: 37.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c75653b7e85797d58859076957e5ec4614ff54e0c18522167546d5917c8041b5
|
|
| MD5 |
7b76631bd42e7ec58e9f85892058f8ba
|
|
| BLAKE2b-256 |
c7c06256b9925cf75397f1552dd4936c6fd7b21478fbefdad314668cf8f7eb38
|