Skip to main content

YARA generator inspired by yarGen

Project description

yarobot

License: GPL v3 Python Version Rust PyPI version

yarobot is a high-performance YARA rule generator inspired by yarGen project, designed to automatically create quality YARA rules from malware samples while minimizing false positives through intelligent goodware database comparison. screenshot

✨ Features

  • Automated YARA Rule Generation: Create both simple and super rules from malware samples
  • Advanced Scoring System: String scoring with goodware database comparison
  • High-Performance Engine: Rust-based core stringZZ for fast file processing
  • Multiple Interfaces: CLI, Python API, and web interface
  • Intelligent Filtering: Automatic exclusion of common goodware strings for your specific dataset
  • Super Rules: Automatic creation of rules that match multiple related samples

🏗️ Architecture

flowchart TD
    A[CLI] --> D
    B[Web Upload] --> D
    C[API Call] --> D
    
    D[Token extraction] --> E[Scoring]
    F[Goodware DB] --> E
    
    E --> G[YARA Generator]
    G --> H[Rule file]
    G --> I[Web Display]
    G --> J[API JSON]

🛠 Installation

1. Install from PyPI

pip install yarobot

2. Install from Source

# Clone repository
git clone https://github.com/ogre2007/yarobot
cd yarobot

# Install in development mode
pip install -e .

# Or install with all dependencies
pip install ".[dev]"

📖 Quick Start

1. First-Time Setup (optional but recommended)

# Create a goodware database
mkdir -p ./dbs
py -m yarobot.database create /path/to/goodware/files --recursive --opcodes

# The database will be saved in ./dbs/

2. Generate Your First Rules

# Basic rule generation
py -m yarobot.generate /path/to/malware/samples \
  --output-rule-file my_rules.yar \
  --author "Your Name" \
  --ref "Case-001"

3. Launch Web Interface

# Start with your database
py -m yarobot.app -g ./dbs

# Access at http://localhost:5000

then locate http://localhost:5000 or use api directly from anywhere:

curl -X POST -F "files=@tests\\data\\binary" http://localhost:5000/api/analyze -F "min_score=5" -F "get_opcodes=true"

4. Advanced Configuration

py -m yarobot.generate /malware/samples -g <goodware dbs path> \
  --opcodes \
  --recursive \
  --author "My Security Team" \
  --ref "Internal Investigation 2024" \
  --superrule-overlap 5 \
  --strings-per-rule 15

5. Database Management

# Update existing database with new goodware samples
(TODO) py -m yarobot.database update /path/to/new/goodware --identifier corporate 

# Create new database from scratch
py -m yarobot.database create /path/to/goodware --opcodes

🔧 Configuration Options

Rule Generation Options

  • --min-size, --max-size: String length boundaries
  • --min-score: Minimum string score threshold
  • --opcodes: Enable opcode feature for additional detection capabilities
  • --superrule-overlap: Minimum overlapping strings for super rule creation
  • --recursive: Scan directories recursively
  • --excludegood: Force exclusion of all goodware strings
  • --oe: only executable extensions

Database Options

  • --identifier: Database identifier for multi-environment support
  • --update: Update existing databases with new samples
  • --only-executable: Only process executable file extensions

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

TODO's

  • Global project refactoring & packaging
  • Token extraction rewritten in Rust
  • Tests & CI/CD pipeline
  • Multiplatform PyPI release
  • HTTP service with web UI
  • Store regex patterns in configuration
  • Wide/ASCII token merging
  • Token deduplication
  • Fix/improve imphash/exports handling
  • Include default databases
  • Rule generation improvements
  • Separate token extraction to stringZZ package
  • Regexp generation
  • LLM Scoring support

📄 License

This project is licensed under the GPLv3 License - see the LICENSE file for details.

🙏 Credits

  • yarGen by Florian Roth (initial idea and implementation)
  • Pyo3 for Python-Rust integration
  • goblin for binary parsing

📞 Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yarobot-0.5.1b0.tar.gz (996.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yarobot-0.5.1b0-py3-none-any.whl (67.4 kB view details)

Uploaded Python 3

File details

Details for the file yarobot-0.5.1b0.tar.gz.

File metadata

  • Download URL: yarobot-0.5.1b0.tar.gz
  • Upload date:
  • Size: 996.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for yarobot-0.5.1b0.tar.gz
Algorithm Hash digest
SHA256 9f73fe69d3f18a704bf563b724217a0082ddc52f123ba10f3a8891f0efa0bf33
MD5 e621ca97b9debe6cfbfb2e4691e051bc
BLAKE2b-256 02d3eebdc8b4fa8175061ef4641899f2b0828353f0ebf6ea50fb55abed014c42

See more details on using hashes here.

File details

Details for the file yarobot-0.5.1b0-py3-none-any.whl.

File metadata

  • Download URL: yarobot-0.5.1b0-py3-none-any.whl
  • Upload date:
  • Size: 67.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.16 {"installer":{"name":"uv","version":"0.9.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for yarobot-0.5.1b0-py3-none-any.whl
Algorithm Hash digest
SHA256 be2ad06319ad69b4dbe17b54f5df70f01c9efc2083b34a456c798a45c8d49a10
MD5 bc7c0ee88d9b387f35e075ed8b405e7d
BLAKE2b-256 0d0269d71b8aca86240fb52b6c63cb7452eb8d0a3462ce75a84333ca8f732b5e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page