Skip to main content

Elastic Malware Benchmark for Empowering Researchers

Project description

EMBER Feature Extraction

CI status Version

This repository allows the user to easily create a dataset using EMBERv3 features, starting from a collection of PE files.

If you want to work with EMBER2017 dataset (containing features from 1.1 million PE files scanned in or before 2017) or the EMBER2018 dataset (containing features from 1 million PE files scanned in or before 2018), or EMBER2024 please refer to the official repository.

Details of the selected features is available here: https://arxiv.org/pdf/2506.05074

Prerequisites

  • Make sure you have a running and active version of Docker.

Usage:

  1. Clone the repository and change directory:

    git clone git@github.com:w-disaster/ember.git && cd ember
    
  2. Setup the directory containing PE files. The directory should have the following structure:

    <YOUR_PE_MALWARE_DIR>/
    ├── <FAMILY_0>/
    │   ├── SHA_0_0
    │   ├── SHA_0_1
    │   ├── ...
    │   └──
    ├── <FAMILY_1>/
    │   ├── SHA_1_0
    │   ├── ...
    │   └──
    ├── ...
    └── 
    

    where FAMILY_0, FAMILY_1, ... are the directories named with the malware family and SHA_0_0, SHA_0_1, ... are the PE files named with their SHA256.

    The directory structure doesn't change if you want to do malware detection: simply create two directories benign and malicious as the malware families.

  3. Configure the env variables and Run the static features extraction:

    MALWARE_DIR_PATH=<YOUR_MALWARE_DIR>
    PE_DATASET_NAME=<YOUR_PE_DATASET_NAME>
    EMBER_DATA_DIR=<YOUR_EMBER_OUTPUT_DIR>
    
    docker run \
    --name ember-feature-extraction \
    -e MALWARE_DIR_PATH=/usr/input_data/malware/ \
    -e FINAL_DATASET_FILENAME=/usr/app/dataset/$PE_DATASET_NAME.pkl \
    -e N_PROCESSES=64 \
    -v $MALWARE_DIR_PATH:/usr/input_data/malware/ \
    -v $EMBER_DATA_DIR:/usr/app/dataset/ \
    ghcr.io/malware-concept-drift-detection/ember-features-extraction:master
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ember_cdd_wdis-1.2.1.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ember_cdd_wdis-1.2.1-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file ember_cdd_wdis-1.2.1.tar.gz.

File metadata

  • Download URL: ember_cdd_wdis-1.2.1.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for ember_cdd_wdis-1.2.1.tar.gz
Algorithm Hash digest
SHA256 37f4d54513c358e9509e0961e25cd91aeeaaea1429170ee26f05159a18231198
MD5 fcd5d2bf54408c9e0efadfb8f0a43ff9
BLAKE2b-256 c5d78998c37d3d7eefa143bf94040cbce9abe7be7b039e5b8a66812319b7aa47

See more details on using hashes here.

File details

Details for the file ember_cdd_wdis-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: ember_cdd_wdis-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 17.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.11.0-1018-azure

File hashes

Hashes for ember_cdd_wdis-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 156c23b0eabdefee9eecd4ede75c0fc3e25d192777af823c0ef2a8ccab58d158
MD5 16cdfa2bbfe88cf8302805343d78bf98
BLAKE2b-256 a19a692826f36145e93bd28c367526e7ca44f34e371a406ed133a5b8ca0e3eb7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page