Skip to main content

Base framework for building malware detectors

Project description

malware-detector

A base framework for building malware detectors with modern Python.

Features

  • Pydantic v2 Configuration - Type-safe config with env vars and file support
  • Typer CLI - Extensible command-line interface via factory function
  • Structured Logging - Console and JSON output formats with structlog
  • Customizable Pipeline - Define your own stages or use defaults
  • Type Hints - Full typing support with py.typed marker

Requirements

Tool Version
Python >= 3.12

Installation

pip install malware-detector

Or with uv:

uv add malware-detector

Quick Start

Basic Usage

from malware_detector import BaseDetector, BaseDetectorConfig

class MyDetector(BaseDetector):
    """My custom malware detector."""

    def stage_extract(self):
        self.log.info("extracting_features", input=str(self.config.path.input))
        # Extract features from dataset
        return self.config.folder.feature

    def stage_vectorize(self):
        # Convert features to vectors
        return self.config.folder.vectorize

    def stage_train(self):
        # Train the model
        return self.config.folder.model

    def stage_predict(self):
        # Run predictions
        return self.config.path.output


# Run the detector
detector = MyDetector()
detector.setup()  # Creates directories
results = detector.run()  # Runs all stages

Run Specific Stages

# Run only extract and vectorize
results = detector.run(stages=["extract", "vectorize"])

Custom Pipeline

class ClusteringDetector(BaseDetector):
    """Detector with custom pipeline stages."""

    default_stages = ["preprocess", "embed", "cluster", "export"]

    def stage_preprocess(self):
        ...

    def stage_embed(self):
        ...

    def stage_cluster(self):
        ...

    def stage_export(self):
        ...

Configuration

Custom Config

from pydantic_settings import SettingsConfigDict
from malware_detector import BaseDetectorConfig

class MyConfig(BaseDetectorConfig):
    """Custom configuration with additional fields."""

    model_config = SettingsConfigDict(
        env_prefix="MY_DETECTOR_",
    )

    batch_size: int = 32
    model_name: str = "default"
    use_gpu: bool = True


class MyDetector(BaseDetector):
    config_class = MyConfig

    def stage_train(self):
        self.log.info("training", batch_size=self.config.batch_size)
        ...

Environment Variables

export MALWARE_DETECTOR_CLASSIFY=true
export MALWARE_DETECTOR_PATH__INPUT="./my_dataset"

Config File

Save as config.toml:

[path]
input = "./Dataset/program"
output = "./Predict/predict.json"

[folder]
dataset = "./Dataset/"
feature = "./Feature/"

classify = false

CLI Integration

Create CLI for Your Detector

from malware_detector import create_cli
from my_detector import MyDetector

app = create_cli(MyDetector)

# Add custom commands
@app.command()
def evaluate():
    """Evaluate the trained model."""
    ...

if __name__ == "__main__":
    app()

CLI Usage

# Generate default config
python -m my_detector init --output config.toml

# Run full pipeline
python -m my_detector run --config config.toml

# Run specific stages
python -m my_detector run --stages extract,vectorize

# JSON logging for production
python -m my_detector run --log-format json --log-level DEBUG

Logging

from malware_detector import configure_logging, get_logger

# Configure at startup
configure_logging(level="INFO", format="console")

# Get a logger
log = get_logger(__name__)
log.info("event_name", key="value", count=42)

Output formats:

# Console (development)
2024-01-19T10:30:00 [info] event_name    key=value count=42

# JSON (production)
{"event": "event_name", "key": "value", "count": 42, "timestamp": "..."}

Migration from v0.1.x

v0.1.x v0.2.0
from malwareDetector.detector import detector from malware_detector import BaseDetector
class MyDetector(detector) class MyDetector(BaseDetector)
def extractFeature(self) def stage_extract(self)
def vectorize(self) def stage_vectorize(self)
def model(self, training) def stage_train(self)
def predict(self) def stage_predict(self)
config.json() config.model_dump_json()
Config.parse_raw(data) Config.model_validate_json(data)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

malwaredetector-0.3.1.tar.gz (40.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

malwaredetector-0.3.1-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file malwaredetector-0.3.1.tar.gz.

File metadata

  • Download URL: malwaredetector-0.3.1.tar.gz
  • Upload date:
  • Size: 40.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.5

File hashes

Hashes for malwaredetector-0.3.1.tar.gz
Algorithm Hash digest
SHA256 c94b28e17fe479bd36123d55da7a3b8c77fe87afeb4f9dc7080a20a3a2acffc1
MD5 a18df17c283555f1e385dc26a697c368
BLAKE2b-256 c65df1025f8d972a6b016fce770cf1424dcb3f777f398e533f473a5eb3d5b38d

See more details on using hashes here.

File details

Details for the file malwaredetector-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for malwaredetector-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f0cecddaf99f7070f17a27ee1d194d204937624eed0be1f55d8c6f803afbc421
MD5 76f4c2879643f97ecea175169bbbabe3
BLAKE2b-256 9e1fa7c34b98b9f7554553a6a719aedc71aac8a097245bf042aea18b6203c853

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page