Skip to main content

Elastic Malware Benchmark for Empowering Researchers

Project description

EMBER Feature Extraction

CI status Version

This repository allows the user to easily create a dataset using EMBER features, starting from a collection of PE files.

Setup the directory of PE executables, configure the Docker Compose file and deploy the pipeline. A final csv file with all the features will be created.

If you want to work with EMBER2017 dataset (containing features from 1.1 million PE files scanned in or before 2017) or the EMBER2018 dataset (containing features from 1 million PE files scanned in or before 2018), please refer to the official repository.

Details of the selected features is available here: https://arxiv.org/abs/1804.04637

Prerequisites

  • Make sure you have a running and active version of Docker.

Usage:

  1. Clone the repository and change directory:

    git clone git@github.com:w-disaster/ember.git && cd ember
    
  2. Setup the directory containing PE files. The directory should have the following structure:

    your_base_dir/
    ├── malware_family_0/
    │   ├── id_malware_sample_0_0.exe
    │   ├── id_malware_sample_0_1.exe
    ├── malware_family_1/
    │   ├── id_malware_sample_1_0.exe
    └── ...
    

    Each PE filename will be used as the sample index in the final dataset.

    The directory structure doesn't change if you want to do malware detection: simply create two directories benign and malicious as the malware families.

  3. Configure docker-compose.yaml:

    1. Set the number of processes N_PROCESSES for parallel processing;
    2. Change the volume source point of the base directory with PE files (your_base_dir). Default is /home/luca/WD/NortonDataset670/MALWARE/;
    3. Set the directory volume where the final dataset will be saved (default ./dataset/)
  4. Deploy the pipeline:

    docker compose up
    
  5. Check out the dataset with filename malware_ember_features.csv inside the configured directory. The dataset will have all the columns named.

    Besides the features it contains a column sha256 and a family column. The first one is the PE file id which has specifically been used in our case, while the family is the malware family of the corresponding sample. If you use another PE id or do malware detection, consider to change these column names afterwards.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ember_cdd_wdis-1.0.1.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ember_cdd_wdis-1.0.1-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file ember_cdd_wdis-1.0.1.tar.gz.

File metadata

  • Download URL: ember_cdd_wdis-1.0.1.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.12.3 Linux/6.11.0-1013-azure

File hashes

Hashes for ember_cdd_wdis-1.0.1.tar.gz
Algorithm Hash digest
SHA256 665112371e92b007e00e55e3583ae5c105570187378bdcc3185c867cdb7d01f8
MD5 f34622dd232318fb83756b3ff7a5e8aa
BLAKE2b-256 eb0090759315d199ec327262e518325abf6f1d4e3f96418e1b8323acc1c6dce2

See more details on using hashes here.

File details

Details for the file ember_cdd_wdis-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: ember_cdd_wdis-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.12.3 Linux/6.11.0-1013-azure

File hashes

Hashes for ember_cdd_wdis-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 909390c8a9bfeaec0348d54686584bfefbe70791ca122ce425db07d156b580cb
MD5 969f8221ae07c3df929b36f25058fb57
BLAKE2b-256 e7b1084424f6102fedd8b0c3de61bdcfb51dc0f4a2348cce37a847ff88b0987c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page