malwi - AI Python Malware Scanner
Project description
malwi - AI Python Malware Scanner
Detect Python malware fast - no internet, no expensive hardware, no fees.
malwi is specialized in detecting zero-day vulnerabilities, for classifying code as safe or harmful.
Open-source software made in Europe. Based on open research, open code, open data. 🇪🇺🤘🕊️
# Install
pip install --user malwi
# Run
malwi ./examples
Output:
## examples/__init__.py
- Object: runcommand
- Maliciousness: 0.9620079398155212
def runcommand(value):
output = subprocess.run(value, shell=True, capture_output=True)
return [output.stdout, output.stderr]
TARGETED_FILE resume load_global subprocess load_attr run load_fast value load_const INTEGER load_const INTEGER kw_names capture_output shell call store_fast output load_fast output load_attr stdout load_fast output load_attr stderr build_list return_value
...
Why malwi?
The number of malicious open-source packages is growing. This is not just a threat to your business but also to the open-source community.
Typical malware behaviors include:
- Exfiltration of data: Stealing credentials, API keys, or sensitive user data.
- Backdoors: Allowing remote attackers to gain unauthorized access to your system.
- Destructive actions: Deleting files, corrupting databases, or sabotaging applications.
Attention: Malicious packages might execute code during installation (e.g. through
setup.py). Make sure to NOT download or install malicious packages from the dataset with commands likeuv add,pip install,poetry add.
What's next?
The first iteration focuses on maliciousness of Python source code.
Future iterations will cover malware scanning for more languages (JavaScript, Rust, Go) and more formats (binaries, logs).
How does it work?
malwi applies DistilBert and Support Vector Machines (SVM) based on the design of Zero Day Malware Detection with Alpha: Fast DBI with Transformer Models for Real World Application (2025).
Additionally, malwi applies Tree-sitter for creating abstract syntax trees (ASTs) which are mapped to a unified and security sensitive syntax used as training input. The Python malware dataset can be found here. After 3 epochs of training you will get: Loss: 0.0986, Accuracy: 0.9669, F1: 0.9666.
High-level training pipeline:
- Create dataset from malicious/benign repositories and map code to malwi syntax
- Remove code duplications based on hashes
- Train DistilBert based on the malwi samples for categorizing malicious/benign
Support
Do you have access to malicious Rust, Go, whatever packages? Contact me.
Develop
Prerequisites: uv
# Download and process data
cmds/download_and_preprocess.sh
# Only process data
cmds/preprocess.sh
# Preprocess then start training
cmds/preprocess_and_train.sh
# Only start training
cmds/train.sh
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file malwi-0.0.9.tar.gz.
File metadata
- Download URL: malwi-0.0.9.tar.gz
- Upload date:
- Size: 65.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.7.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4bcdf57360114a2f502a1a827f6f16c82513f38cc762e97b4198bb6dbb392ad
|
|
| MD5 |
592d55fc59944dae1c41505ff7d44c01
|
|
| BLAKE2b-256 |
c024b99457cc2516a1db6f437277f931d4b3eb44da910f89e24ee5d55c8ec05f
|
File details
Details for the file malwi-0.0.9-py3-none-any.whl.
File metadata
- Download URL: malwi-0.0.9-py3-none-any.whl
- Upload date:
- Size: 56.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.7.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e68eb8bd1bb674fd501f0fb44b9fe4110a8a403965fb26baa2b5c7cabff4122b
|
|
| MD5 |
b5ed6262f27a285dfafcb80c82b21c8e
|
|
| BLAKE2b-256 |
702a55c7c75af1e3d45c219c778a62097a861098255760224bcf4303383971ce
|