Skip to main content

YARA generator inspired by yarGen

Project description

yarobot

License: GPL v3 Python Version Rust PyPI version

yarobot is a high-performance YARA rule generator inspired by yarGen project, designed to automatically create quality YARA rules from malware samples while minimizing false positives through intelligent goodware database comparison. screenshot

✨ Features

  • Automated YARA Rule Generation: Create both simple and super rules from malware samples
  • Advanced Scoring System: String scoring with goodware database comparison
  • High-Performance Engine: Rust-based core stringZZ for fast file processing
  • Multiple Interfaces: CLI, Python API, and web interface
  • Intelligent Filtering: Automatic exclusion of common goodware strings for your specific dataset
  • Super Rules: Automatic creation of rules that match multiple related samples

🏗️ Architecture

flowchart TD
    A[CLI] --> D
    B[Web Upload] --> D
    C[API Call] --> D
    
    D[Token extraction] --> E[Scoring]
    F[Goodware DB] --> E
    
    E --> G[YARA Generator]
    G --> H[Rule file]
    G --> I[Web Display]
    G --> J[API JSON]

🛠 Installation

1. Install from PyPI

pip install yarobot

2. Install from Source

# Clone repository
git clone https://github.com/ogre2007/yarobot
cd yarobot

# Install in development mode
pip install -e .

# Or install with all dependencies
pip install ".[dev]"

📖 Quick Start

1. First-Time Setup (optional but recommended)

# Create a goodware database
mkdir -p ./dbs
py -m yarobot.database create /path/to/goodware/files --recursive --opcodes

# The database will be saved in ./dbs/

2. Generate Your First Rules

# Basic rule generation
py -m yarobot.generate /path/to/malware/samples \
  --output-rule-file my_rules.yar \
  --author "Your Name" \
  --ref "Case-001"

3. Launch Web Interface

# Start with your database
py -m yarobot.app -g ./dbs

# Access at http://localhost:5000

then locate http://localhost:5000 or use api directly from anywhere:

curl -X POST -F "files=@tests\\data\\binary" http://localhost:5000/api/analyze -F "min_score=5" -F "get_opcodes=true"

4. Advanced Configuration

py -m yarobot.generate /malware/samples -g <goodware dbs path> \
  --opcodes \
  --recursive \
  --author "My Security Team" \
  --ref "Internal Investigation 2024" \
  --superrule-overlap 5 \
  --strings-per-rule 15

5. Database Management

# Update existing database with new goodware samples
(TODO) py -m yarobot.database update /path/to/new/goodware --identifier corporate 

# Create new database from scratch
py -m yarobot.database create /path/to/goodware --opcodes

🔧 Configuration Options

Rule Generation Options

  • --min-size, --max-size: String length boundaries
  • --min-score: Minimum string score threshold
  • --opcodes: Enable opcode feature for additional detection capabilities
  • --superrule-overlap: Minimum overlapping strings for super rule creation
  • --recursive: Scan directories recursively
  • --excludegood: Force exclusion of all goodware strings
  • --oe: only executable extensions

Database Options

  • --identifier: Database identifier for multi-environment support
  • --update: Update existing databases with new samples
  • --only-executable: Only process executable file extensions

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

TODO's

  • Global project refactoring & packaging
  • Token extraction rewritten in Rust
  • Tests & CI/CD pipeline
  • Multiplatform PyPI release
  • HTTP service with web UI
  • Store regex patterns in configuration
  • Wide/ASCII token merging
  • Token deduplication
  • Fix/improve imphash/exports handling
  • Include default databases
  • Rule generation improvements
  • Separate token extraction to stringZZ package
  • Regexp generation
  • LLM Scoring support

📄 License

This project is licensed under the GPLv3 License - see the LICENSE file for details.

🙏 Credits

  • yarGen by Florian Roth (initial idea and implementation)
  • Pyo3 for Python-Rust integration
  • goblin for binary parsing

📞 Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yarobot-0.5.2.tar.gz (996.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yarobot-0.5.2-py3-none-any.whl (67.3 kB view details)

Uploaded Python 3

File details

Details for the file yarobot-0.5.2.tar.gz.

File metadata

  • Download URL: yarobot-0.5.2.tar.gz
  • Upload date:
  • Size: 996.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for yarobot-0.5.2.tar.gz
Algorithm Hash digest
SHA256 5238f2e7e26727275905c123ba063e7d15fc19c0231046ac73af4d5cb8ef7eed
MD5 f085ba38e0727f9df05bc26006b5c429
BLAKE2b-256 5bf63950a7cffb34c47716e150e19ec777aa7dd42a86347c7bf8dedefb23e1b8

See more details on using hashes here.

File details

Details for the file yarobot-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: yarobot-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 67.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for yarobot-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7f00ad834ef5b19aea3d6576ef7b6305614ad13ccabca41d30e2599d9b8968c3
MD5 bc9385b25b2770a458be0261ad5010cc
BLAKE2b-256 8c7c4702a61ba8146779f4a6dfe176303bfb35b6fde03fe668dca1108f93835c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page