Skip to main content

Modernized CADE package for concept drift detection, adapted for FIRCE integration.

Project description

CADE-FIRCE

Modernized CADE for concept drift detection in reusable Python workflows.

This package adapts the original CADE codebase from the USENIX Security 2021 paper into a library-oriented Python package for integration into other systems. In particular, it provides a runtime detector API that can be used inside streaming pipelines, evaluation frameworks, and drift monitoring components rather than only through the original experimental scripts. :contentReference[oaicite:2]{index=2} :contentReference[oaicite:3]{index=3}

What this fork adds

This fork keeps the core CADE idea intact while updating the codebase for modern Python packaging and programmatic use.

Key changes include:

  • packaging through pyproject.toml
  • modern dependency management with uv
  • a runtime-facing detector class, CadeRuntimeDetector
  • a clearer fit and detect workflow for integration into other projects
  • improved validation and runtime checks for detector configuration and input data shapes

The main entry point for integration is cade.runtime.CadeRuntimeDetector, which exposes a direct library API for training on reference data and then scoring incoming batches for drift. :contentReference[oaicite:4]{index=4}

Background

CADE, short for Contrastive Autoencoder for Drift Detection and Explanation, was introduced in:

Limin Yang, Wenbo Guo, Qingying Hao, Arridhana Ciptadi, Ali Ahmadzadeh, Xinyu Xing, and Gang Wang.
CADE: Detecting and Explaining Concept Drift Samples for Security Applications.
USENIX Security 2021. :contentReference[oaicite:5]{index=5}

The original work targets a specific form of concept drift in security settings, especially cases where new samples no longer align well with previously learned class structure. This fork focuses on making that detector easier to embed in downstream systems.

If you build on this package in a project or publication, please cite the original CADE paper.

@inproceedings{yang2021cade,
  title={$\{$CADE$\}$: Detecting and explaining concept drift samples for security applications},
  author={Yang, Limin and Guo, Wenbo and Hao, Qingying and Ciptadi, Arridhana and Ahmadzadeh, Ali and Xing, Xinyu and Wang, Gang},
  booktitle={30th USENIX Security Symposium (USENIX Security 21)},
  pages={2327--2344},
  year={2021}
}

Installation

This project uses uv for environment and dependency management.

Clone the repository, then sync dependencies:

uv sync

For development dependencies:

uv sync --group dev

For scripting helpers:

uv sync --group scripting

To install all configured dependency groups:

uv sync --all-groups

Development workflow

Common development commands:

uv lock
uv sync
uv run pytest -q
uv run pytest --cov=cade --cov-report=term-missing --cov-report=xml
uv run ruff format .
uv run ruff check .
uv run ruff check . --fix
uv build
uv run twine check dist/*
uv run deptry .

If you use the included Makefile, these commands are wrapped in targets such as make sync, make test, make lint, and make build.

Runtime drift detection

The primary integration surface is CadeRuntimeDetector.

It is designed for the common pattern:

  1. Fit the detector on known reference data
  2. Encode incoming samples into CADE's latent space
  3. Measure distance to learned class centroids
  4. Compute robust anomaly scores using per-class median and MAD statistics
  5. Flag row-level drift and summarize chunk-level drift status

Detector behavior

After fit, the detector stores:

  • the observed training classes
  • a label-to-index mapping
  • latent centroids for each class
  • per-class median distances
  • per-class MAD-scaled distance statistics
  • the trained encoder model

During detect(x), the detector:

  • validates the input batch
  • encodes each row into latent space
  • computes distance from each encoded row to every class centroid
  • converts those distances into anomaly scores
  • marks a row as drifted if its minimum anomaly score exceeds mad_threshold
  • marks the chunk as drifted if drift count or drift ratio exceeds configured thresholds

This makes the detector useful both for per-row inspection and for higher-level monitoring decisions.

Basic example

A minimal runtime example looks like this:

from __future__ import annotations

import numpy as np

from cade.runtime import CadeRuntimeDetector

X_train = np.random.rand(1000, 32).astype(np.float32)
y_train = np.random.randint(0, 3, size=1000)

X_chunk = np.random.rand(128, 32).astype(np.float32)

detector = CadeRuntimeDetector(
    dims=[32, 64, 16],
    margin=10.0,
    mad_threshold=3.5,
    min_drift_ratio=0.05,
    min_drift_count=1,
    batch_size=64,
    epochs=25,
    lr=1e-3,
)

detector.fit(X_train, y_train)
out = detector.detect(X_chunk)

print("Chunk drift:", out.chunk_drift)
print("Drifted rows:", int(out.row_flags.sum()))
print("Scores shape:", out.scores.shape)

The returned object contains:

  • row_flags: boolean drift flags for each row
  • scores: per-row anomaly scores
  • closest_classes: nearest learned class for each row
  • chunk_drift: overall chunk-level drift decision

Integration example in a monitoring pipeline

One intended use of this package is wrapping the runtime detector inside a project-specific monitoring interface. For example, a monitoring component can fit CADE on training data and then translate CADE output into a framework-specific drift result object:

from __future__ import annotations

from typing import TYPE_CHECKING

import numpy as np

from cade.runtime import CadeRuntimeDetector

from firce.drift_monitor.base import DriftDetectionResult

from .cade_config import CadeMonitorConfig

if TYPE_CHECKING:
    from firce.utils.config import SimulationConfig
    from firce.utils.perf_stats import PerformanceStats


class CadeDriftMonitor:
    def __init__(self, config: SimulationConfig) -> None:
        cade_cfg = CadeMonitorConfig(**config.monitor_kwargs)

        self._detector = CadeRuntimeDetector(
            dims=cade_cfg.dims,
            margin=cade_cfg.margin,
            mad_threshold=cade_cfg.mad_threshold,
            min_drift_ratio=cade_cfg.min_drift_ratio,
            min_drift_count=cade_cfg.min_drift_count,
            batch_size=cade_cfg.batch_size,
            epochs=cade_cfg.epochs,
            lr=cade_cfg.lr,
            cae_lambda_1=cade_cfg.cae_lambda_1,
            similar_ratio=cade_cfg.similar_ratio,
            display_interval=cade_cfg.display_interval,
            force_retrain=cade_cfg.force_retrain,
            weights_path=cade_cfg.weights_path,
            device=cade_cfg.device,
        )

    def fit(
        self,
        X_train: np.ndarray,
        y_train: np.ndarray,
        perf_stats: PerformanceStats | None = None,
    ) -> None:
        self._detector.fit(X_train, y_train)

    def detect(self, X: np.ndarray) -> DriftDetectionResult:
        out = self._detector.detect(X)
        row_flags = np.asarray(out.row_flags, dtype=bool).reshape(-1)
        scores = np.asarray(out.scores, dtype=float).reshape(-1)

        return DriftDetectionResult(
            row_flags=row_flags,
            chunk_drift=bool(row_flags.any()),
            scores=scores,
            metadata={
                "drift_count": int(row_flags.sum()),
                "chunk_size": int(len(row_flags)),
                "drift_ratio": float(row_flags.mean()) if len(row_flags) else 0.0,
            },
        )

This pattern is useful when CADE is one detector among several, or when a larger framework expects a standard drift-monitor interface.

API notes

CadeRuntimeDetector(...)

Important configuration parameters include:

  • dims: network dimensions, including input and latent dimensions
  • margin: contrastive margin used during training
  • mad_threshold: row-level anomaly threshold
  • min_drift_ratio: chunk-level ratio threshold
  • min_drift_count: chunk-level count threshold
  • batch_size: training batch size
  • epochs: number of training epochs
  • lr: optimizer learning rate
  • cae_lambda_1: CAE training weight
  • similar_ratio: ratio used for similar-pair construction
  • display_interval: training log interval
  • weights_path: optional saved weights path
  • device: TensorFlow device string such as /CPU:0
  • force_retrain: whether to discard an existing weights file before training

fit(x_train, y_train)

Fits the detector on labeled reference data. Input requirements:

  • x_train must be a 2D array
  • y_train must be a 1D array
  • lengths must match
  • x_train.shape[1] must equal dims[0]
  • at least two classes must be present in training data

detect(x)

Scores a batch for drift. Input requirements:

  • x must be a 2D array
  • x.shape[1] must equal dims[0]
  • the detector must already be fitted

When to use this package

This package is a good fit when you need:

  • a drift detector that can be embedded directly into Python systems
  • row-level drift flags and continuous anomaly scores
  • chunk-level drift decisions based on configurable thresholds
  • a detector that learns class structure in a latent space rather than relying only on raw-feature distances

It is especially useful in workflows where training data represents known classes and incoming data may contain new or shifted patterns that no longer fit those learned latent distributions.

Project status

This package is a maintained downstream adaptation of the original CADE research code. It is intended to make CADE easier to use in modern Python environments and in integration-heavy projects such as evaluation pipelines, security tooling, and drift monitoring frameworks.

It should not be treated as the official upstream release.

Attribution

This package is derived from the original CADE codebase and research work by:

  • Limin Yang
  • Wenbo Guo
  • Qingying Hao
  • Arridhana Ciptadi
  • Ali Ahmadzadeh
  • Xinyu Xing
  • Gang Wang

If you use this fork, please credit both:

  1. the original CADE paper for the research contribution
  2. this package or repository for packaging and runtime integration work, where appropriate

License

This repository retains the original CADE licensing terms.

For ethical considerations, the code and data are covered by a modified BSD 3-Clause style license that restricts use to non-commercial scientific research and non-commercial education. Commercial use is prohibited.

Please review the LICENSE file before redistribution or use.

Repository links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cade_firce-0.4.1.tar.gz (54.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cade_firce-0.4.1-py3-none-any.whl (64.7 kB view details)

Uploaded Python 3

File details

Details for the file cade_firce-0.4.1.tar.gz.

File metadata

  • Download URL: cade_firce-0.4.1.tar.gz
  • Upload date:
  • Size: 54.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for cade_firce-0.4.1.tar.gz
Algorithm Hash digest
SHA256 990b5c66de3d6d666f66c4fc5d9c86a79571f192badc5ba276592f323e8c0927
MD5 804cf8fd6540ad4f8d2f1293aec9ef70
BLAKE2b-256 643ae991990c8411e80624195054194a5487debc6623e678c2d4c98a1dd82a65

See more details on using hashes here.

File details

Details for the file cade_firce-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: cade_firce-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 64.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for cade_firce-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 62e6fdb8743b0bee42fcd2f125099764884573ffcf1eef8acabcd9e1071eae0e
MD5 ee147734ef12f75c6effb8a013b14c85
BLAKE2b-256 608cd83ffc9d5f05840fc01a2ce13faa0b213ff7cea2f1544ce2da2fc9f0a3f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page