Skip to main content

malwi - AI Python Malware Scanner

Project description

malwi - AI Python Malware Scanner

Logo

Detect Python malware fast - no internet, no expensive hardware, no fees.

malwi is specialized in detecting zero-day vulnerabilities, for classifying code as safe or harmful.

Open-source software made in Europe. Based on open research, open code, open data. 🇪🇺🤘🕊️

# Install
pip install --user malwi

# Run
malwi ./examples

| File                 | Name       |   Malicious |
|----------------------|------------|-------------|
| examples/__init__.py | run        |        0.93 |
| examples/__init__.py | debug      |        0.99 |
| examples/__init__.py | runcommand |        1    |

Why malwi?

The number of malicious open-source packages is growing. This is not just a threat to your business but also to the open-source community.

Typical malware behaviors include:

  • Exfiltration of data: Stealing credentials, API keys, or sensitive user data.
  • Backdoors: Allowing remote attackers to gain unauthorized access to your system.
  • Destructive actions: Deleting files, corrupting databases, or sabotaging applications.

Attention: Malicious packages might execute code during installation (e.g. through setup.py). Make sure to NOT download or install malicious packages from the dataset with commands like uv add, pip install, poetry add.

What's next?

The first iteration focuses on maliciousness of Python source code.

Future iterations will cover malware scanning for more languages (JavaScript, Rust, Go) and more formats (binaries, logs).

How does it work?

malwi applies DistilBert and Support Vector Machines (SVM) based on the design of Zero Day Malware Detection with Alpha: Fast DBI with Transformer Models for Real World Application (2025). Additionally, malwi applies Tree-sitter for creating abstract syntax trees (ASTs) which are mapped to a unified and security sensitive syntax used as training input. The Python malware dataset can be found here. After 3 epochs of training you will get: Loss: 0.0986, Accuracy: 0.9669, F1: 0.9666.

High-level training pipeline:

  • Create dataset from malicious/benign repositories and map code to malwi syntax
  • Remove code duplications based on hashes
  • Train DistilBert based on the malwi samples for categorizing malicious/benign

Support

Do you have access to malicious Rust, Go, whatever packages? Contact me.

Develop

Prerequisites: uv

# Download and process data
cmds/download_and_preprocess.sh

# Only process data
cmds/preprocess.sh

# Preprocess then start training
cmds/preprocess_and_train.sh

# Only start training
cmds/train.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

malwi-0.0.8.tar.gz (64.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

malwi-0.0.8-py3-none-any.whl (56.4 kB view details)

Uploaded Python 3

File details

Details for the file malwi-0.0.8.tar.gz.

File metadata

  • Download URL: malwi-0.0.8.tar.gz
  • Upload date:
  • Size: 64.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.7.8

File hashes

Hashes for malwi-0.0.8.tar.gz
Algorithm Hash digest
SHA256 9a49fa9fc7cb9bc0e3198b03f3c88f6362fcfb00c8b228e94a142a7c44873e3d
MD5 5fd96930e2b28e51271b181f0167c43a
BLAKE2b-256 2527538f00c20c6e9f578d6fefec0e520dbfd6b00cfbd67e0aa91db7ed18fbb0

See more details on using hashes here.

File details

Details for the file malwi-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: malwi-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 56.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.7.8

File hashes

Hashes for malwi-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 db8e0520a12b7c1da8125eceaf61095e3e76455d23a0c71d75ab68fd17beb617
MD5 566aec528c3c64de3a11c8fd68bf5f00
BLAKE2b-256 7cbe437cb110104c46773a12593e49008ac5ded3e81b9b5e40f9ed649995b100

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page