malwi - AI Python Malware Scanner

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

canvascomputing schirrmacher

Project description

malwi - AI Python Malware Scanner

Detect Python malware fast - no internet, no expensive hardware, no fees.

malwi is specialized in detecting zero-day vulnerabilities, for classifying code as safe or harmful.

Open-source software made in Europe. Based on open research, open code, open data. 🇪🇺🤘🕊️

Install

pip install --user malwi

Run

malwi ./examples

Evaluate: a recent zero-day detected with high confidence

- 2 files scanned
- 0 files skipped
- 3 malicious objects

=> 👹 malicious 1.0

Why malwi?

The number of malicious open-source packages is growing. This is not just a threat to your business but also to the open-source community.

Typical malware behaviors include:

Exfiltration of data: Stealing credentials, API keys, or sensitive user data.
Backdoors: Allowing remote attackers to gain unauthorized access to your system.
Destructive actions: Deleting files, corrupting databases, or sabotaging applications.

How does it work?

malwi applies DistilBert based on the design of Zero Day Malware Detection with Alpha: Fast DBI with Transformer Models for Real World Application (2025). The malwi-samples dataset is used for training.

1. Compile Python files to bytecode

def runcommand(value):
    output = subprocess.run(value, shell=True, capture_output=True)
    return [output.stdout, output.stderr]

  0           RESUME                   0

  1           LOAD_CONST               0 (<code object runcommand at 0x5b4f60ae7540, file "example.py", line 1>)
              MAKE_FUNCTION
              STORE_NAME               0 (runcommand)
              RETURN_CONST             1 (None)
  ...

2. Map bytecode to tokens

TARGETED_FILE resume load_global subprocess load_attr run load_fast value load_const INTEGER load_const INTEGER kw_names capture_output shell call store_fast output load_fast output load_attr stdout load_fast output load_attr stderr build_list return_value

3. Feed tokens into pre-trained DistilBert

=> Maliciousness Score: 0.92

This creates a list with malicious code objects. However malicious code might be split into chunks and spread across a package. This is why the next layers are needed.

4. Create statistics about malicious activities

Object	DYNAMIC_CODE_EXECUTION	ENCODING_DECODING	FILESYSTEM_ACCESS	...
Object A	0	1	0	...
Object B	1	2	1	...
Object C	0	0	2	...
Package	1	3	3	...

5. Take final decision

An SVM layer takes statistics as input and decides if all findings combined are malicious.

SVM => Malicious

Benchmarks?

DistilBert

Metric	Value
F1 Score	0.96
Recall	0.95
Precision	0.98
Training time	~4 hours
Hardware	NVIDIA RTX 4090
Epochs	3

SVM Layer

Metric	Value
F1 Score	0.96
Recall	0.95
Precision	0.95

Limitations

malwi compiles Python to bytecode, which is highly version dependent. The AI models are trained on that bytecode. This means the performance might drop if a user installed a Python version which creates different bytecode instructions. There is no data yet about this.

The malicious dataset includes some boilerplate functions, such as init functions, which can also appear in benign code. These cause false positives during scans. The goal is to triage and reduce such false positives to improve malwi's accuracy.

What's next?

The first iteration focuses on maliciousness of Python source code.

Future iterations will cover malware scanning for more languages (JavaScript, Rust, Go) and more formats (binaries, logs).

Support

Do you have access to malicious Rust, Go, whatever packages? Contact me.

Develop

Prerequisites:

uv
Download malwi-samples in the same parent folder

# Download and process data
cmds/download_and_preprocess_distilbert.sh

# Preprocess and train DistilBERT only
cmds/preprocess_and_train_distilbert.sh

# Preprocess and train SVM Layer only
cmds/preprocess_and_train_svm.sh

# Only preprocess data for DistilBERT
cmds/preprocess_distilbert.sh

# Only preprocess data for SVM Layer
cmds/preprocess_svm.sh

# Start DistilBERT training
cmds/train_distilbert.sh

# Start SVM Layer training
cmds/train_svm_layer.sh

Triage

malwi uses a pipeline that can be enhanced by triaging its results (see src/research/triage.py). For automated triaging, you can leverage open-source models in combination with Ollama.

Start LLM

ollama run gemma3

Start Triaging

uv run python -m src.research.triage --triage-ollama --path <FOLDER_WITH_MALWI_YAML_RESULTS>

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

canvascomputing schirrmacher

Release history Release notifications | RSS feed

0.0.35

Apr 2, 2026

0.0.34

Apr 2, 2026

0.0.33

Apr 1, 2026

0.0.32

Apr 1, 2026

0.0.31

Apr 1, 2026

0.0.30

Mar 16, 2026

0.0.29

Mar 15, 2026

0.0.28

Mar 14, 2026

0.0.27

Mar 12, 2026

0.0.26

Mar 10, 2026

0.0.25

Mar 9, 2026

0.0.24

Mar 4, 2026

0.0.23

Aug 19, 2025

0.0.22

Aug 15, 2025

0.0.21

Aug 14, 2025

0.0.20

Aug 14, 2025

0.0.19

Aug 12, 2025

0.0.18

Jul 2, 2025

0.0.17

Jul 2, 2025

0.0.15

Jun 20, 2025

This version

0.0.14

Jun 16, 2025

0.0.13

May 30, 2025

0.0.12

May 28, 2025

0.0.11

May 26, 2025

0.0.10

May 26, 2025

0.0.9

May 26, 2025

0.0.8

May 26, 2025

0.0.7

May 15, 2025

0.0.6

May 12, 2025

0.0.5

May 12, 2025

0.0.4

May 11, 2025

0.0.3

May 11, 2025

0.0.2

May 11, 2025

0.0.1

May 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

malwi-0.0.14.tar.gz (85.7 kB view details)

Uploaded Jun 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

malwi-0.0.14-py3-none-any.whl (74.4 kB view details)

Uploaded Jun 16, 2025 Python 3

File details

Details for the file malwi-0.0.14.tar.gz.

File metadata

Download URL: malwi-0.0.14.tar.gz
Upload date: Jun 16, 2025
Size: 85.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.7.13

File hashes

Hashes for malwi-0.0.14.tar.gz
Algorithm	Hash digest
SHA256	`75d9f5e3522333b00f726ac3c6270226cebb22c0fac58b3d57bb1e6bf479dfef`
MD5	`db21db64c53b373a2458ffbb996c6f4a`
BLAKE2b-256	`732fe6677a33cac3fbd1ccea7f3a48a54ba0bb3b024b9082bb767dc0cd0542fe`

See more details on using hashes here.

File details

Details for the file malwi-0.0.14-py3-none-any.whl.

File metadata

Download URL: malwi-0.0.14-py3-none-any.whl
Upload date: Jun 16, 2025
Size: 74.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.7.13

File hashes

Hashes for malwi-0.0.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1c664e65f96aab0047870059ab8a22a0b060eb6372384cddae6b3cb73ec6a9f6`
MD5	`f6d7b4e3cbd8c5a40f60517799aaebb7`
BLAKE2b-256	`8013d46959e11d2af9806c1be9bd10d71508528bdb3c1ae0737ee9dbff49ea81`

See more details on using hashes here.

malwi 0.0.14

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

malwi - AI Python Malware Scanner

Why malwi?

How does it work?

1. Compile Python files to bytecode

2. Map bytecode to tokens

3. Feed tokens into pre-trained DistilBert

4. Create statistics about malicious activities

5. Take final decision

Benchmarks?

DistilBert

SVM Layer

Limitations

What's next?

Support

Develop

Triage

Start LLM

Start Triaging

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes