Skip to main content

Transcendent adaptation for multiclass problems

Project description

Transcendent Code (Fork)

Using conformal evaluation to detect concept drift affecting malware detection.

For more information, you can see the project page: https://s2lab.cs.ucl.ac.uk/projects/transcend/

Notes about this fork

In this fork, several changes were made to enable Transcendent to handle multiclass problems. These changes affect only the ICE solution, as it is the most practical approach.

A Non-Conformity Measure (NCM) based on Random Forest proximities, as presented here, has been implemented. Additionally, the confidence score function was modified to support multiple classes.

For time reasons, the thresholding phase is left out of the scope, so it should be derived manually or using a user defined function.

What is Transcend and Conformal Evaluation?

Malware evolves rapidly which makes it hard---if not impossible---to generalize learning models to reflect future, previously-unseen behaviors. Consequently, most malware classifiers become unsustainable in the long run, becoming rapidly antiquated as malware continues to evolve.

Transcendent is a toolset which, together with a statistical framework called conformal evaluation, aims to identify aging classification models in vivo during deployment, before the machine learning model's performance starts to degrade.

Further details can be found in the paper Transcending TRANSCEND: Revisiting Malware Classification in the Presence of Concept Drift. by F. Barbero, F. Pendlebury, F. Pierazzi, and L. Cavallaro (IEEE S&P 2022).

If you end up using Transcendent as part of a project or publication, please include a citation of the S&P paper:

@inproceedings{barbero2022transcendent,
author = {Federico Barbero and Feargus Pendlebury and Fabio Pierazzi and Lorenzo Cavallaro},
title = {Transcending Transcend: Revisiting Malware Classification in the Presence of Concept Drift},
booktitle = {{IEEE} Symposium on Security and Privacy},
year = {2022},
}

Transcendent is based on Transcend. Further details can be found in the paper Transcend: Detecting Concept Drift in Malware Classification Models. by R. Jordaney, K. Sharad, S. K. Dash, Z. Wang, D. Papini, I. Nouretdinov, and L. Cavallaro (USENIX Sec 2017). An associated presentation can be found at the Usenix site.

If you end up using Transcendent as part of a project or publication, please include a citation of the original Transcend Usenix paper as well:

@inproceedings {jordaney2017,
    author = {Roberto Jordaney and Kumar Sharad and Santanu K. Dash and Zhi Wang and Davide Papini and Ilia Nouretdinov and Lorenzo Cavallaro},
    title = {Transcend: Detecting Concept Drift in Malware Classification Models},
    booktitle = {26th {USENIX} Security Symposium ({USENIX} Security 17)},
    year = {2017},
    isbn = {978-1-931971-40-9},
    address = {Vancouver, BC},
    pages = {625--642},
    url = {https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/jordaney},
    publisher = {{USENIX} Association},
}

Getting Started

Installation

Transcend requires Python 3 (preferably >= 3.5) as well as the statistical learning stack of NumPy, SciPy, and Scikit-learn.

Package dependencies can be installed by using the listing in requirements.txt.

pip install -r requirements.txt

A full installation can be peformed using setup.py:

pip install -r requirements.txt
python setup.py install 

Features to reproduce the Android experiments can be downloaded from this link

Features for Marvin and Drebin can be downloaded from this link

Usage

Conformal evaluation can get a little bit fiddly, so it's advised that you become familiar with a typical testing pipeline such as the example given in ce.py as well as the following functions (which are particularly affected by different configuration settings):

  • utils.parse_args()
  • data.load_features()
  • thresholding.find_quartile_thresholds()
  • thresholding.find_random_search_thresholds()
  • thresholding.sort_by_predicted_label()
  • thresholding.get_performance_with_rejection()

ce.py

An example conformal evaluation pipeline using the Transcend library is given in ce.py. It can be run with a multitude of command line arguments.

Comparing quartiles of correct predictions using credibility only:

python3 ce.py	                  	    \
    --train drebin              	    \
    --test marvin_full          	    \
    -k 10                       	    \
    -n 10                       	    \
    --pval-consider full-train  	    \
    -t quartiles                	    \
    --q-consider correct                \
    -c cred                     	 

Random search for thresholds maximising F1 above threshold and minimising F1 of rejected predictions while enforcing thresholds for credibility and confidence:

python3 ce.py	                  	    \
    --train drebin              	    \
    --test marvin_full          	    \
    -k 10                       	    \
    -n -2                       	    \
    --pval-consider full-train  	    \
    -t random-search            	    \
    -c cred+conf                  	    \
    --rs-max f1_k           	 	    \
    --rs-min f1_r              		    \
    --rs-limit reject_total_perc:0.25   \
    --rs-samples 500

Random search for thresholds maximising F1 above threshold subject to the total percentage of rejected elements while enforcing credibility thresholds:

python3 ce.py 		                                \
	--train drebin                                  \
	--test marvin_half                              \
	-k 10                                           \
	-n -1                                           \
	--pval-consider full-train                      \
	-t constrained-search                           \
	-c cred                                         \
	--cs-max f1_k:0.95                              \
	--cs-con kept_pos_perc:0.76,kept_neg_perc:0.76  \
	--rs-samples 500

Acknowledgements

This research has been partially supported by the UK EPSRC grants EP/K033344/1, EP/L022710/1, EP/K006266/1, and EP/P009301/1 as well as the NVIDIA Corporation, NHS England, and Innovate UK.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transcendent_multiclass_wdis-1.0.2.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transcendent_multiclass_wdis-1.0.2-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file transcendent_multiclass_wdis-1.0.2.tar.gz.

File metadata

  • Download URL: transcendent_multiclass_wdis-1.0.2.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.12.3 Linux/6.11.0-1012-azure

File hashes

Hashes for transcendent_multiclass_wdis-1.0.2.tar.gz
Algorithm Hash digest
SHA256 70eed0dc932167e31fcfe0c5d9217443c2df694c3167bfbeb7ff457f2ef745fa
MD5 052fdf6e1f9ebc53fcf6855f2193f365
BLAKE2b-256 fed9e46f10561cffaa6fe368fe120c9c55385495a3bd5a5d8188b7ed5ee17e33

See more details on using hashes here.

File details

Details for the file transcendent_multiclass_wdis-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for transcendent_multiclass_wdis-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 63c81c7ebda4996b6a4392c68aa6359057aca70a20e209b54cb05f7f3824f922
MD5 e55fd3f3fa364acf6ddfcf20f63b44e8
BLAKE2b-256 45586c57d2b96451c5b443d2cc48914bd0f88db5c3b95b1ce0b05cf32977a69a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page