Skip to main content

Base framework for building malware detectors

Project description

malware-detector

A base framework for building malware detectors with modern Python.

Features

  • Pydantic v2 Configuration - Type-safe config with env vars and file support
  • Typer CLI - Extensible command-line interface via factory function
  • Structured Logging - Console and JSON output formats with structlog
  • Customizable Pipeline - Define your own stages or use defaults
  • Type Hints - Full typing support with py.typed marker

Requirements

Tool Version
Python >= 3.12

Installation

pip install malware-detector

Or with uv:

uv add malware-detector

Quick Start

Basic Usage

from malware_detector import BaseDetector, BaseDetectorConfig

class MyDetector(BaseDetector):
    """My custom malware detector."""

    def stage_extract(self):
        self.log.info("extracting_features", input=str(self.config.path.input))
        # Extract features from dataset
        return self.config.folder.feature

    def stage_vectorize(self):
        # Convert features to vectors
        return self.config.folder.vectorize

    def stage_train(self):
        # Train the model
        return self.config.folder.model

    def stage_predict(self):
        # Run predictions
        return self.config.path.output


# Run the detector
detector = MyDetector()
detector.setup()  # Creates directories
results = detector.run()  # Runs all stages

Run Specific Stages

# Run only extract and vectorize
results = detector.run(stages=["extract", "vectorize"])

Custom Pipeline

class ClusteringDetector(BaseDetector):
    """Detector with custom pipeline stages."""

    default_stages = ["preprocess", "embed", "cluster", "export"]

    def stage_preprocess(self):
        ...

    def stage_embed(self):
        ...

    def stage_cluster(self):
        ...

    def stage_export(self):
        ...

Configuration

Custom Config

from pydantic_settings import SettingsConfigDict
from malware_detector import BaseDetectorConfig

class MyConfig(BaseDetectorConfig):
    """Custom configuration with additional fields."""

    model_config = SettingsConfigDict(
        env_prefix="MY_DETECTOR_",
    )

    batch_size: int = 32
    model_name: str = "default"
    use_gpu: bool = True


class MyDetector(BaseDetector):
    config_class = MyConfig

    def stage_train(self):
        self.log.info("training", batch_size=self.config.batch_size)
        ...

Environment Variables

export MALWARE_DETECTOR_CLASSIFY=true
export MALWARE_DETECTOR_PATH__INPUT="./my_dataset"

Config File

Save as config.toml:

[path]
input = "./Dataset/program"
output = "./Predict/predict.json"

[folder]
dataset = "./Dataset/"
feature = "./Feature/"

classify = false

CLI Integration

Create CLI for Your Detector

from malware_detector import create_cli
from my_detector import MyDetector

app = create_cli(MyDetector)

# Add custom commands
@app.command()
def evaluate():
    """Evaluate the trained model."""
    ...

if __name__ == "__main__":
    app()

CLI Usage

# Generate default config
python -m my_detector init --output config.toml

# Run full pipeline
python -m my_detector run --config config.toml

# Run specific stages
python -m my_detector run --stages extract,vectorize

# JSON logging for production
python -m my_detector run --log-format json --log-level DEBUG

Logging

from malware_detector import configure_logging, get_logger

# Configure at startup
configure_logging(level="INFO", format="console")

# Get a logger
log = get_logger(__name__)
log.info("event_name", key="value", count=42)

Output formats:

# Console (development)
2024-01-19T10:30:00 [info] event_name    key=value count=42

# JSON (production)
{"event": "event_name", "key": "value", "count": 42, "timestamp": "..."}

Migration from v0.1.x

v0.1.x v0.2.0
from malwareDetector.detector import detector from malware_detector import BaseDetector
class MyDetector(detector) class MyDetector(BaseDetector)
def extractFeature(self) def stage_extract(self)
def vectorize(self) def stage_vectorize(self)
def model(self, training) def stage_train(self)
def predict(self) def stage_predict(self)
config.json() config.model_dump_json()
Config.parse_raw(data) Config.model_validate_json(data)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

malwaredetector-0.2.0.tar.gz (39.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

malwaredetector-0.2.0-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file malwaredetector-0.2.0.tar.gz.

File metadata

  • Download URL: malwaredetector-0.2.0.tar.gz
  • Upload date:
  • Size: 39.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.5

File hashes

Hashes for malwaredetector-0.2.0.tar.gz
Algorithm Hash digest
SHA256 d47801b7b4afb4febbc2391f9179519c40af4675cc0c72edf540d891c6713b18
MD5 834152c10ecc74226c07dad376df1f54
BLAKE2b-256 6d609e3ba5f482b26957ffa965d3548176c68d90f59160fd1f5570aad256543f

See more details on using hashes here.

File details

Details for the file malwaredetector-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for malwaredetector-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 61df04447dfa24e6be82043d6c0c5efd33e51d4a394307f09fd9139754627eb5
MD5 5581ffc325cfb89f2b19c6360dd17e4b
BLAKE2b-256 480b4941c7ff5862e27f2333b6b78506ff2393e50d511739cde91595a4ca45f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page