malwi - AI Python Malware Scanner

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

canvascomputing schirrmacher

These details have not been verified by PyPI

Operating System
- OS Independent
Programming Language

Project description

malwi - AI Python Malware Scanner

malwi specializes in finding malware

Key Features

🛡️ AI-Powered Python Malware Detection: Leverages advanced AI to identify malicious code in Python projects with high accuracy.
⚡ Lightning-Fast Codebase Scanning: Scans entire repositories in seconds, so you can focus on development—not security worries.
🔒 100% Offline & Private: Your code never leaves your machine. Full control, zero data exposure.
💰 Free & Open-Source: No hidden costs. Built on transparent research and openly available data.
🇪🇺 Developed in the EU: Committed to open-source principles and European data standards.

1) Install

pip install --user malwi

2) Run

malwi scan examples/malicious

3) Evaluate: a recent zero-day detected with high confidence

                  __          __
  .--------.---.-|  .--.--.--|__|
  |        |  _  |  |  |  |  |  |
  |__|__|__|___._|__|________|__|
     AI Python Malware Scanner


- target: examples
- seconds: 1.87
- files: 14
  ├── scanned: 4 (.py)
  ├── skipped: 10 (.cfg, .md, .toml, .txt)
  └── suspicious:
      ├── examples/malicious/discordpydebug-0.0.4/setup.py
      │   └── <module>
      │       ├── archive compression
      │       └── package installation execution
      └── examples/malicious/discordpydebug-0.0.4/src/discordpydebug/__init__.py
          ├── <module>
          │   ├── process management
          │   ├── deserialization
          │   ├── system interaction
          │   └── user io
          ├── run
          │   └── fs linking
          ├── debug
          │   ├── fs linking
          │   └── archive compression
          └── runcommand
              └── process management

=> 👹 malicious 0.98

PyPI Package Scanning

malwi can directly scan PyPI packages without executing malicious logic, typically placed in setup.py or __init__.py files:

malwi pypi requests

                  __          __
  .--------.---.-|  .--.--.--|__|
  |        |  _  |  |  |  |  |  |
  |__|__|__|___._|__|________|__|
     AI Python Malware Scanner


- target: downloads/requests-2.32.4.tar
- seconds: 3.10
- files: 84
  ├── scanned: 34
  └── skipped: 50

=> 🟢 good

Why malwi?

Malicious actors are increasingly targeting open-source projects, introducing packages designed to compromise security.

Common malicious behaviors include:

Data exfiltration: Theft of sensitive information such as credentials, API keys, or user data.
Backdoors: Unauthorized remote access to systems, enabling attackers to exploit vulnerabilities.
Destructive actions: Deliberate sabotage, including file deletion, database corruption, or application disruption.

How does it work?

malwi is based on the design of Zero Day Malware Detection with Alpha: Fast DBI with Transformer Models for Real World Application (2025).

Imagine there is a function like:

def runcommand(value):
    output = subprocess.run(value, shell=True, capture_output=True)
    return [output.stdout, output.stderr]

1. Files are compiled to create an Abstract Syntax Tree with Tree-sitter

module [0, 0] - [3, 0]
  function_definition [0, 0] - [2, 41]
    name: identifier [0, 4] - [0, 14]
    parameters: parameters [0, 14] - [0, 21]
      identifier [0, 15] - [0, 20]
...

2. The AST is transpiled to dummy bytecode

The bytecode is enhanced with security related instructions.

TARGETED_FILE PUSH_NULL LOAD_GLOBAL PROCESS_MANAGEMENT LOAD_ATTR run LOAD_PARAM value LOAD_CONST BOOLEAN LOAD_CONST BOOLEAN KW_NAMES shell capture_output CALL STRING_VERSION STORE_GLOBAL output LOAD_GLOBAL output LOAD_ATTR stdout LOAD_GLOBAL output LOAD_ATTR stderr BUILD_LIST STRING_VERSION RETURN_VALUE

3. The bytecode is fed into a pre-trained DistilBERT

A DistilBERT model trained on malware-samples is used to identify suspicious code patterns.

=> Maliciousness: 0.98

Python API

malwi provides a comprehensive Python API for integrating malware detection into your applications.

Quick Start

import malwi

report = malwi.MalwiReport.create(input_path="suspicious_file.py")

for obj in report.malicious_objects:
    print(f"File: {obj.file_path}")

`MalwiReport`

MalwiReport.create(
    input_path,               # str or Path - file/directory to scan
    accepted_extensions=None, # List[str] - file extensions to scan (e.g., ['py', 'js'])
    silent=False,             # bool - suppress progress messages
    malicious_threshold=0.7,  # float - threshold for malicious classification (0.0-1.0)
    on_finding=None           # callable - callback when malicious objects found
) -> MalwiReport              # Returns: MalwiReport instance with scan results

import malwi

report = malwi.MalwiReport.create("suspicious_directory/")

# Properties
report.malicious              # bool: True if malicious objects detected
report.confidence             # float: Overall confidence score (0.0-1.0)
report.duration               # float: Scan duration in seconds
report.all_objects            # List[MalwiObject]: All analyzed code objects
report.malicious_objects      # List[MalwiObject]: Objects exceeding threshold
report.threshold              # float: Maliciousness threshold used (0.0-1.0)
report.all_files              # List[Path]: All files found in input path
report.skipped_files          # List[Path]: Files skipped (wrong extension)
report.processed_files        # int: Number of files successfully processed
report.activities             # List[str]: Suspicious activities detected
report.input_path             # str: Original input path scanned
report.start_time             # str: ISO 8601 timestamp when scan started
report.all_file_types         # List[str]: All file extensions found
report.version                # str: Malwi version with model hash

# Methods
report.to_demo_text()         # str: Human-readable tree summary
report.to_json()              # str: JSON formatted report
report.to_yaml()              # str: YAML formatted report
report.to_markdown()          # str: Markdown formatted report

# Pre-load models to avoid delay on first prediction
malwi.MalwiReport.load_models_into_memory()

`MalwiObject`

obj = report.all_objects[0]

# Core properties
obj.name                # str: Function/class/module name
obj.file_path           # str: Path to source file
obj.language            # str: Programming language ('python'/'javascript')
obj.maliciousness       # float|None: ML confidence score (0.0-1.0)
obj.warnings            # List[str]: Compilation warnings/errors

# Source code and AST compilation
obj.file_source_code    # str: Complete content of source file
obj.source_code         # str|None: Extracted source for this specific object
obj.byte_code           # List[Instruction]|None: Compiled AST bytecode
obj.location            # Tuple[int,int]|None: Start and end line numbers
obj.embedding_count     # int: Number of DistilBERT tokens (cached)

# Analysis methods
obj.predict()           # dict: Run ML prediction and update maliciousness
obj.to_tokens()         # List[str]: Extract tokens for analysis
obj.to_token_string()   # str: Space-separated token string
obj.to_string()         # str: Bytecode as readable string
obj.to_hash()           # str: SHA256 hash of bytecode
obj.to_dict()           # dict: Serializable representation
obj.to_yaml()           # str: YAML formatted output
obj.to_json()           # str: JSON formatted output

# Class methods
MalwiObject.all_tokens(language="python")  # List[str]: All possible tokens

Benchmarks?

training_loss: 0.0110
epochs_completed: 3.0000
original_train_samples: 598540.0000
windowed_train_features: 831865.0000
original_validation_samples: 149636.0000
windowed_validation_features: 204781.0000
benign_samples_used: 734930.0000
malicious_samples_used: 13246.0000
benign_to_malicious_ratio: 60.0000
vocab_size: 30522.0000
max_length: 512.0000
window_stride: 128.0000
batch_size: 16.0000
eval_loss: 0.0107
eval_accuracy: 0.9980
eval_f1: 0.9521
eval_precision: 0.9832
eval_recall: 0.9229
eval_runtime: 115.5982
eval_samples_per_second: 1771.4900
eval_steps_per_second: 110.7200
epoch: 3.0000

Contributing & Support

Found a bug or have a feature request? Open an issue.
Do you have access to malicious packages in Rust, Go, or other languages? Contact via GitHub profile.
Struggling with false-positive findings? Create a Pull-Request.

Research

Prerequisites

Package Manager: Install uv for fast Python dependency management
Training Data: The research CLI will automatically clone malwi-samples when needed

Quick Start

# Install dependencies
uv sync

# Run tests
uv run pytest tests

# Train a model from scratch (full pipeline with automatic data download)
./research download preprocess train

Individual Pipeline Steps

# 1. Download training data (clones malwi-samples + downloads repositories)
./research download

# 2. Data preprocessing only (parallel processing, ~4 min on 32 cores)
./research preprocess --language python

# 3. Model training only (tokenizer + DistilBERT, ~40 minutes on NVIDIA RTX 4090)
./research train

Limitations

The malicious dataset includes some boilerplate functions, such as setup functions, which can also appear in benign code. These cause false positives during scans. The goal is to triage and reduce such false positives to improve malwi's accuracy.

What's next?

The first iteration focuses on maliciousness of Python source code.

Future iterations will cover malware scanning for more languages (JavaScript, Rust, Go) and more formats (binaries, logs).

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

canvascomputing schirrmacher

These details have not been verified by PyPI

Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

0.0.35

Apr 2, 2026

0.0.34

Apr 2, 2026

0.0.33

Apr 1, 2026

0.0.32

Apr 1, 2026

0.0.31

Apr 1, 2026

0.0.30

Mar 16, 2026

0.0.29

Mar 15, 2026

0.0.28

Mar 14, 2026

0.0.27

Mar 12, 2026

0.0.26

Mar 10, 2026

0.0.25

Mar 9, 2026

0.0.24

Mar 4, 2026

This version

0.0.23

Aug 19, 2025

0.0.22

Aug 15, 2025

0.0.21

Aug 14, 2025

0.0.20

Aug 14, 2025

0.0.19

Aug 12, 2025

0.0.18

Jul 2, 2025

0.0.17

Jul 2, 2025

0.0.15

Jun 20, 2025

0.0.14

Jun 16, 2025

0.0.13

May 30, 2025

0.0.12

May 28, 2025

0.0.11

May 26, 2025

0.0.10

May 26, 2025

0.0.9

May 26, 2025

0.0.8

May 26, 2025

0.0.7

May 15, 2025

0.0.6

May 12, 2025

0.0.5

May 12, 2025

0.0.4

May 11, 2025

0.0.3

May 11, 2025

0.0.2

May 11, 2025

0.0.1

May 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

malwi-0.0.23.tar.gz (72.1 kB view details)

Uploaded Aug 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

malwi-0.0.23-py3-none-any.whl (74.5 kB view details)

Uploaded Aug 19, 2025 Python 3

File details

Details for the file malwi-0.0.23.tar.gz.

File metadata

Download URL: malwi-0.0.23.tar.gz
Upload date: Aug 19, 2025
Size: 72.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.8.12

File hashes

Hashes for malwi-0.0.23.tar.gz
Algorithm	Hash digest
SHA256	`dc5115e507ae4add9a57c75aa4f46d674fe861204df07e6a49e0ef7635c4c5e1`
MD5	`c45de11bb0bf32c7bcdbc27882888ac8`
BLAKE2b-256	`97b133e57209261614961f80d77c8fbe920b570e9b60c74f8bfe79b073cca83d`

See more details on using hashes here.

File details

Details for the file malwi-0.0.23-py3-none-any.whl.

File metadata

Download URL: malwi-0.0.23-py3-none-any.whl
Upload date: Aug 19, 2025
Size: 74.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.8.12

File hashes

Hashes for malwi-0.0.23-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0d87384d41a57014d4c398c5706a0a2bcf88d6a9dd056f892231797b79713137`
MD5	`62f7ea3d84d9f9fbd1d4802a25d477c1`
BLAKE2b-256	`7ecd95a0e22f1440661e0d0d14e36f41e209808dfe5e0ae85df7cbd237ab552c`

See more details on using hashes here.

malwi 0.0.23

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

malwi - AI Python Malware Scanner

malwi specializes in finding malware

Key Features

1) Install

2) Run

3) Evaluate: a recent zero-day detected with high confidence

PyPI Package Scanning

Why malwi?

How does it work?

1. Files are compiled to create an Abstract Syntax Tree with Tree-sitter

2. The AST is transpiled to dummy bytecode

3. The bytecode is fed into a pre-trained DistilBERT

Python API

Quick Start

MalwiReport

MalwiObject

Benchmarks?

Contributing & Support

Research

Prerequisites

Quick Start

Individual Pipeline Steps

Limitations

What's next?

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`MalwiReport`

`MalwiObject`