Skip to main content

Transcendent adaptation for multiclass problems

Project description

Transcendent Multiclass

CI status Version

This repository enables users to apply Transcendent-like concept drift detection to both binary and multiclass problems.

Modifications have been made specifically to the ICE (Inductive Conformal Evaluator) implementation, while the other solutions (i.e. TCE, CCE, etc.) are out of the scope.

This project adapts Transcendent for multiclass problems by implementing two Nonconformity Measures (NCM) for Random forest and LightGBM classifiers.

Prerequisites

  • Setup the train/test split directory, which should contains the following files:

    time_split/
    ├── X_train.pkl
    ├── X_test.pkl
    ├── X_proper_train.pkl
    ├── X_cal.pkl
    ├── y_train.pkl
    ├── y_test.pkl
    ├── y_proper_train.pkl
    └── y_cal.pkl
    
  • Make sure to have a running and active version of Docker.

Usage:

  1. Clone the repository and change directory:

    git clone git@github.com:w-disaster/transcendent-multiclass.git && cd transcendent-multiclass
    
  2. Configure the env variables and Run Inductive Conformal Evaluator:

    PE_DATASET_NAME=<YOUR_PE_DATASET_NAME>
    SPLITTED_MPH_DATASET_PATH=<YOUR_PRE_SPLITTED_DATA>
    BEST_HYP_DIR=<YOUR_BEST_HYP_DIR> # Based on format produced by overfitting-analysis
    
    docker run -d \
    --name mph-feature-extraction-$PE_DATASET_NAME \
    -e BASE_DATASET_PATH=/usr/app/dataset/ \
    -e PE_DATASET_TYPE=${PE_DATASET_NAME}_mph \
    -e SPLITTED_MPH_DATASET_PATH=/usr/input_data/splitted_dataset/ \
    -e BEST_HYP_DIR=/usr/input_data/best_hyp/ \
    -e FEATURE_TYPE=dts \
    -v $BEST_HYP_DIR:/usr/input_data/best_hyp/ \
    -v $SPLITTED_MPH_DATASET_PATH:/usr/input_data/splitted_dataset/ \
    -v ./results_multiclass/:/usr/app/models/ \
    ghcr.io/malware-concept-drift-detection/transcendent-multiclass:main
    

    A results_multiclass/ directory will be locally created containing the credibility ($p$-values) and confidence scores for both calibration and testing sets.

  3. Analysis post ICE:

    Check whether novel families in the testing set produce smaller $p$-values, and thus can be discriminated from seen families.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transcendent_multiclass_cdd_wdis-1.1.1.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file transcendent_multiclass_cdd_wdis-1.1.1.tar.gz.

File metadata

File hashes

Hashes for transcendent_multiclass_cdd_wdis-1.1.1.tar.gz
Algorithm Hash digest
SHA256 e7fee7d1aac172d9a3821ee2ed2d5bc8634cbe26778c3a983574bfc9362cd3f8
MD5 7da98c25290a5f879f3cfaaff7411261
BLAKE2b-256 39687232c965e779425a15fc904813dd31cfc4abd4787b30e7e25eb604573890

See more details on using hashes here.

File details

Details for the file transcendent_multiclass_cdd_wdis-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for transcendent_multiclass_cdd_wdis-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 46bbdcd7756725cc1d22d742498b4454eef646a11d0d4f07f8c2521961abb73e
MD5 ff612ded24a74df356da1c8aca3625ab
BLAKE2b-256 197eb8b93d0c6c522f0a816085ca87bec7210630d4c62bc1a10153595c7623e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page