Base framework for building malware detectors
Project description
malware-detector
A base framework for building malware detectors with modern Python.
- Source code: https://github.com/louiskyee/malwareDetector.git
- Wiki: https://github.com/louiskyee/malwareDetector/wiki
- PyPI: https://pypi.org/project/malware-detector/
Features
- Pydantic v2 Configuration - Type-safe config with env vars and file support
- Typer CLI - Extensible command-line interface via factory function
- Structured Logging - Console and JSON output formats with structlog
- Customizable Pipeline - Define your own stages or use defaults
- Type Hints - Full typing support with py.typed marker
Requirements
| Tool | Version |
|---|---|
| Python | >= 3.12 |
Installation
pip install malware-detector
Or with uv:
uv add malware-detector
Quick Start
Basic Usage
from malware_detector import BaseDetector, BaseDetectorConfig
class MyDetector(BaseDetector):
"""My custom malware detector."""
def stage_extract(self):
self.log.info("extracting_features", input=str(self.config.path.input))
# Extract features from dataset
return self.config.folder.feature
def stage_vectorize(self):
# Convert features to vectors
return self.config.folder.vectorize
def stage_train(self):
# Train the model
return self.config.folder.model
def stage_predict(self):
# Run predictions
return self.config.path.output
# Run the detector
detector = MyDetector()
detector.setup() # Creates directories
results = detector.run() # Runs all stages
Run Specific Stages
# Run only extract and vectorize
results = detector.run(stages=["extract", "vectorize"])
Custom Pipeline
class ClusteringDetector(BaseDetector):
"""Detector with custom pipeline stages."""
default_stages = ["preprocess", "embed", "cluster", "export"]
def stage_preprocess(self):
...
def stage_embed(self):
...
def stage_cluster(self):
...
def stage_export(self):
...
Configuration
Custom Config
from pydantic_settings import SettingsConfigDict
from malware_detector import BaseDetectorConfig
class MyConfig(BaseDetectorConfig):
"""Custom configuration with additional fields."""
model_config = SettingsConfigDict(
env_prefix="MY_DETECTOR_",
)
batch_size: int = 32
model_name: str = "default"
use_gpu: bool = True
class MyDetector(BaseDetector):
config_class = MyConfig
def stage_train(self):
self.log.info("training", batch_size=self.config.batch_size)
...
Environment Variables
export MALWARE_DETECTOR_CLASSIFY=true
export MALWARE_DETECTOR_PATH__INPUT="./my_dataset"
Config File
Save as config.toml:
[path]
input = "./Dataset/program"
output = "./Predict/predict.json"
[folder]
dataset = "./Dataset/"
feature = "./Feature/"
classify = false
CLI Integration
Create CLI for Your Detector
from malware_detector import create_cli
from my_detector import MyDetector
app = create_cli(MyDetector)
# Add custom commands
@app.command()
def evaluate():
"""Evaluate the trained model."""
...
if __name__ == "__main__":
app()
CLI Usage
# Generate default config
python -m my_detector init --output config.toml
# Run full pipeline
python -m my_detector run --config config.toml
# Run specific stages
python -m my_detector run --stages extract,vectorize
# JSON logging for production
python -m my_detector run --log-format json --log-level DEBUG
Logging
from malware_detector import configure_logging, get_logger
# Configure at startup
configure_logging(level="INFO", format="console")
# Get a logger
log = get_logger(__name__)
log.info("event_name", key="value", count=42)
Output formats:
# Console (development)
2024-01-19T10:30:00 [info] event_name key=value count=42
# JSON (production)
{"event": "event_name", "key": "value", "count": 42, "timestamp": "..."}
Migration from v0.1.x
| v0.1.x | v0.2.0 |
|---|---|
from malwareDetector.detector import detector |
from malware_detector import BaseDetector |
class MyDetector(detector) |
class MyDetector(BaseDetector) |
def extractFeature(self) |
def stage_extract(self) |
def vectorize(self) |
def stage_vectorize(self) |
def model(self, training) |
def stage_train(self) |
def predict(self) |
def stage_predict(self) |
config.json() |
config.model_dump_json() |
Config.parse_raw(data) |
Config.model_validate_json(data) |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
malwaredetector-0.2.0.tar.gz
(39.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file malwaredetector-0.2.0.tar.gz.
File metadata
- Download URL: malwaredetector-0.2.0.tar.gz
- Upload date:
- Size: 39.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d47801b7b4afb4febbc2391f9179519c40af4675cc0c72edf540d891c6713b18
|
|
| MD5 |
834152c10ecc74226c07dad376df1f54
|
|
| BLAKE2b-256 |
6d609e3ba5f482b26957ffa965d3548176c68d90f59160fd1f5570aad256543f
|
File details
Details for the file malwaredetector-0.2.0-py3-none-any.whl.
File metadata
- Download URL: malwaredetector-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61df04447dfa24e6be82043d6c0c5efd33e51d4a394307f09fd9139754627eb5
|
|
| MD5 |
5581ffc325cfb89f2b19c6360dd17e4b
|
|
| BLAKE2b-256 |
480b4941c7ff5862e27f2333b6b78506ff2393e50d511739cde91595a4ca45f0
|