Skip to main content

Base framework for building malware detectors

Project description

malware-detector

A base framework for building malware detectors with modern Python.

Features

  • Pydantic v2 Configuration - Type-safe config with env vars and file support
  • Typer CLI - Extensible command-line interface via factory function
  • Structured Logging - Console and JSON output formats with structlog
  • Customizable Pipeline - Define your own stages or use defaults
  • Type Hints - Full typing support with py.typed marker

Requirements

Tool Version
Python >= 3.12

Installation

pip install malware-detector

Or with uv:

uv add malware-detector

Quick Start

Basic Usage

from malware_detector import BaseDetector, BaseDetectorConfig

class MyDetector(BaseDetector):
    """My custom malware detector."""

    def stage_extract(self):
        self.log.info("extracting_features", input=str(self.config.path.input))
        # Extract features from dataset
        return self.config.folder.feature

    def stage_vectorize(self):
        # Convert features to vectors
        return self.config.folder.vectorize

    def stage_train(self):
        # Train the model
        return self.config.folder.model

    def stage_predict(self):
        # Run predictions
        return self.config.path.output


# Run the detector
detector = MyDetector()
detector.setup()  # Creates directories
results = detector.run()  # Runs all stages

Run Specific Stages

# Run only extract and vectorize
results = detector.run(stages=["extract", "vectorize"])

Custom Pipeline

class ClusteringDetector(BaseDetector):
    """Detector with custom pipeline stages."""

    default_stages = ["preprocess", "embed", "cluster", "export"]

    def stage_preprocess(self):
        ...

    def stage_embed(self):
        ...

    def stage_cluster(self):
        ...

    def stage_export(self):
        ...

Configuration

Custom Config

from pydantic_settings import SettingsConfigDict
from malware_detector import BaseDetectorConfig

class MyConfig(BaseDetectorConfig):
    """Custom configuration with additional fields."""

    model_config = SettingsConfigDict(
        env_prefix="MY_DETECTOR_",
    )

    batch_size: int = 32
    model_name: str = "default"
    use_gpu: bool = True


class MyDetector(BaseDetector):
    config_class = MyConfig

    def stage_train(self):
        self.log.info("training", batch_size=self.config.batch_size)
        ...

Environment Variables

export MALWARE_DETECTOR_CLASSIFY=true
export MALWARE_DETECTOR_PATH__INPUT="./my_dataset"

Config File

Save as config.toml:

[path]
input = "./Dataset/program"
output = "./Predict/predict.json"

[folder]
dataset = "./Dataset/"
feature = "./Feature/"

classify = false

CLI Integration

Create CLI for Your Detector

from malware_detector import create_cli
from my_detector import MyDetector

app = create_cli(MyDetector)

# Add custom commands
@app.command()
def evaluate():
    """Evaluate the trained model."""
    ...

if __name__ == "__main__":
    app()

CLI Usage

# Generate default config
python -m my_detector init --output config.toml

# Run full pipeline
python -m my_detector run --config config.toml

# Run specific stages
python -m my_detector run --stages extract,vectorize

# JSON logging for production
python -m my_detector run --log-format json --log-level DEBUG

Logging

from malware_detector import configure_logging, get_logger

# Configure at startup
configure_logging(level="INFO", format="console")

# Get a logger
log = get_logger(__name__)
log.info("event_name", key="value", count=42)

Output formats:

# Console (development)
2024-01-19T10:30:00 [info] event_name    key=value count=42

# JSON (production)
{"event": "event_name", "key": "value", "count": 42, "timestamp": "..."}

Migration from v0.1.x

v0.1.x v0.2.0
from malwareDetector.detector import detector from malware_detector import BaseDetector
class MyDetector(detector) class MyDetector(BaseDetector)
def extractFeature(self) def stage_extract(self)
def vectorize(self) def stage_vectorize(self)
def model(self, training) def stage_train(self)
def predict(self) def stage_predict(self)
config.json() config.model_dump_json()
Config.parse_raw(data) Config.model_validate_json(data)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

malwaredetector-0.3.0.tar.gz (39.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

malwaredetector-0.3.0-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file malwaredetector-0.3.0.tar.gz.

File metadata

  • Download URL: malwaredetector-0.3.0.tar.gz
  • Upload date:
  • Size: 39.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.5

File hashes

Hashes for malwaredetector-0.3.0.tar.gz
Algorithm Hash digest
SHA256 950d1189769a48eed4299f34d5d422fb2665fbbe74f2c4c79c5538b68ec20da0
MD5 5a34269ecc51200190584b2e2da0a2b1
BLAKE2b-256 9aebeb265b6094c8f2be5bf0eaa36662885a1b4f61c3abd018ee05f7bb82693d

See more details on using hashes here.

File details

Details for the file malwaredetector-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for malwaredetector-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6c26fce377fbe4a852c089700b990fbd613af81bbcb5522b442b73e6abc16b9a
MD5 0dcf5a0ded2dee2c6b79c51b3a2e27e0
BLAKE2b-256 55f3150ae0a01e4e73b0a26494a4f7da060948a83ef9731e12f0bf2a5c3bee32

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page