Skip to main content

Elastic Malware Benchmark for Empowering Researchers

Project description

EMBER Feature Extraction

CI status Version

This repository allows the user to easily create a dataset using EMBER features, starting from a collection of PE files.

Setup the directory of PE executables, configure the Docker Compose file and deploy the pipeline. A final csv file with all the features will be created.

If you want to work with EMBER2017 dataset (containing features from 1.1 million PE files scanned in or before 2017) or the EMBER2018 dataset (containing features from 1 million PE files scanned in or before 2018), please refer to the official repository.

Details of the selected features is available here: https://arxiv.org/abs/1804.04637

Prerequisites

  • Make sure you have a running and active version of Docker.

Usage:

  1. Clone the repository and change directory:

    git clone git@github.com:w-disaster/ember.git && cd ember
    
  2. Setup the directory containing PE files. The directory should have the following structure:

    your_base_dir/
    ├── malware_family_0/
    │   ├── id_malware_sample_0_0.exe
    │   ├── id_malware_sample_0_1.exe
    ├── malware_family_1/
    │   ├── id_malware_sample_1_0.exe
    └── ...
    

    Each PE filename will be used as the sample index in the final dataset.

    The directory structure doesn't change if you want to do malware detection: simply create two directories benign and malicious as the malware families.

  3. Configure docker-compose.yaml:

    1. Set the number of processes N_PROCESSES for parallel processing;
    2. Change the volume source point of the base directory with PE files (your_base_dir). Default is /home/luca/WD/NortonDataset670/MALWARE/;
    3. Set the directory volume where the final dataset will be saved (default ./dataset/)
  4. Deploy the pipeline:

    docker compose up
    
  5. Check out the dataset with filename malware_ember_features.csv inside the configured directory. The dataset will have all the columns named.

    Besides the features it contains a column sha256 and a family column. The first one is the PE file id which has specifically been used in our case, while the family is the malware family of the corresponding sample. If you use another PE id or do malware detection, consider to change these column names afterwards.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ember_cdd_wdis-1.0.0.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ember_cdd_wdis-1.0.0-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file ember_cdd_wdis-1.0.0.tar.gz.

File metadata

  • Download URL: ember_cdd_wdis-1.0.0.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.12.3 Linux/6.11.0-1012-azure

File hashes

Hashes for ember_cdd_wdis-1.0.0.tar.gz
Algorithm Hash digest
SHA256 c055480ccb76bf95164c532d17d95cec16e5a74fb3cf8e63c25eaffbd0b24b69
MD5 b42432066fa05f95f9e5ca2aa75ceea5
BLAKE2b-256 7405db0611e41bd5f1dfbbb9df71e98eec7bb089f777d060876252580fa939b1

See more details on using hashes here.

File details

Details for the file ember_cdd_wdis-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: ember_cdd_wdis-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.12.3 Linux/6.11.0-1012-azure

File hashes

Hashes for ember_cdd_wdis-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fcc478d0d833b299ea572b20ca5ff3618aae8464966530f9713514172f298fda
MD5 1f9fc3a9abb4045b32288281beae45ce
BLAKE2b-256 825c3422af454b372ccd4d5581b85beba6bf53c64af3398b9079f0caf52eb721

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page