Elastic Malware Benchmark for Empowering Researchers
Project description
EMBER Feature Extraction
This repository allows the user to easily create a dataset using EMBER features, starting from a collection of PE files.
Setup the directory of PE executables, configure the Docker Compose file and deploy the pipeline. A final csv file with all the features will be created.
If you want to work with EMBER2017 dataset (containing features from 1.1 million PE files scanned in or before 2017) or the EMBER2018 dataset (containing features from 1 million PE files scanned in or before 2018), or EMBER2024 please refer to the official repository.
Details of the selected features is available here: https://arxiv.org/pdf/2506.05074
Prerequisites
- Make sure you have a running and active version of Docker.
Usage:
-
Clone the repository and change directory:
git clone git@github.com:w-disaster/ember.git && cd ember
-
Setup the directory containing PE files. The directory should have the following structure:
your_base_dir/ ├── malware_family_0/ │ ├── id_malware_sample_0_0 │ ├── id_malware_sample_0_1 ├── malware_family_1/ │ ├── id_malware_sample_1_0 └── ...
Each PE filename will be used as the sample index in the final dataset.
The directory structure doesn't change if you want to do malware detection: simply create two directories
benignandmaliciousas the malware families. -
Configure
docker-compose.yaml:- Set the number of processes
N_PROCESSESfor parallel processing; - Change the volume source point of the base directory with PE files (
your_base_dir). Default is/home/luca/WD/NortonDataset670/MALWARE/; - Set the directory volume where the final dataset will be saved (default
./dataset/)
- Set the number of processes
-
Deploy the pipeline:
docker compose up
-
Check out the dataset with filename
malware_ember_features.csvinside the configured directory. The dataset will have all the columns named.Besides the features it contains a column
sha256and afamilycolumn. The first one is the PE file id which has specifically been used in our case, while thefamilyis the malware family of the corresponding sample. If you use another PE id or do malware detection, consider to change these column names afterwards.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ember_cdd_wdis-1.2.0.tar.gz.
File metadata
- Download URL: ember_cdd_wdis-1.2.0.tar.gz
- Upload date:
- Size: 17.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa1080826365e5eee9414d97ffb26a512d678ed84da98f9b4188d67e1aee97ce
|
|
| MD5 |
fe8672d11f1cd1991d912512b7d141b7
|
|
| BLAKE2b-256 |
b6d2fd662c6907f418ade18782d9b7ba0a358bcfa66e982388c181d0eec6fec0
|
File details
Details for the file ember_cdd_wdis-1.2.0-py3-none-any.whl.
File metadata
- Download URL: ember_cdd_wdis-1.2.0-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-1018-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2587ae0b19f473953c6d1e885b440b52fe82dcbf2a9bb96f4d1db92a8149d283
|
|
| MD5 |
0d47626659da7020e1aab4af18a77617
|
|
| BLAKE2b-256 |
84b3e729eb8a8446d0c55e8acbd5efb449164bf46dec28790517279555d26e3b
|