Skip to main content

YARA generator inspired by yarGen

Project description

yarobot

License: GPL v3 Python Version Rust PyPI version

yarobot is a high-performance YARA rule generator inspired by yarGen project, designed to automatically create quality YARA rules from malware samples while minimizing false positives through intelligent goodware database comparison. screenshot

✨ Features

  • Automated YARA Rule Generation: Create both simple and super rules from malware samples
  • Advanced Scoring System: String scoring with goodware database comparison
  • High-Performance Engine: Rust-based core stringZZ for fast file processing
  • Multiple Interfaces: CLI, Python API, and web interface
  • Intelligent Filtering: Automatic exclusion of common goodware strings for your specific dataset
  • Super Rules: Automatic creation of rules that match multiple related samples

🏗️ Architecture

flowchart TD
    A[CLI] --> D
    B[Web Upload] --> D
    C[API Call] --> D
    
    D[Token extraction] --> E[Scoring]
    F[Goodware DB] --> E
    
    E --> G[YARA Generator]
    G --> H[Rule file]
    G --> I[Web Display]
    G --> J[API JSON]

🛠 Installation

1. Install from PyPI

pip install yarobot

2. Install from Source

# Clone repository
git clone https://github.com/ogre2007/yarobot
cd yarobot

# Install in development mode
pip install -e .

# Or install with all dependencies
pip install ".[dev]"

📖 Quick Start

1. First-Time Setup (optional but recommended)

# Create a goodware database
mkdir -p ./dbs
py -m yarobot.database create /path/to/goodware/files --recursive --opcodes

# The database will be saved in ./dbs/

2. Generate Your First Rules

# Basic rule generation
py -m yarobot.generate /path/to/malware/samples \
  --output-rule-file my_rules.yar \
  --author "Your Name" \
  --ref "Case-001"

3. Launch Web Interface

# Start with your database
py -m yarobot.app -g ./dbs

# Access at http://localhost:5000

then locate http://localhost:5000 or use api directly from anywhere:

curl -X POST -F "files=@tests\\data\\binary" http://localhost:5000/api/analyze -F "min_score=5" -F "get_opcodes=true"

4. Advanced Configuration

py -m yarobot.generate /malware/samples -g <goodware dbs path> \
  --opcodes \
  --recursive \
  --author "My Security Team" \
  --ref "Internal Investigation 2024" \
  --superrule-overlap 5 \
  --strings-per-rule 15

5. Database Management

# Update existing database with new goodware samples
(TODO) py -m yarobot.database update /path/to/new/goodware --identifier corporate 

# Create new database from scratch
py -m yarobot.database create /path/to/goodware --opcodes

🔧 Configuration Options

Rule Generation Options

  • --min-size, --max-size: String length boundaries
  • --min-score: Minimum string score threshold
  • --opcodes: Enable opcode feature for additional detection capabilities
  • --superrule-overlap: Minimum overlapping strings for super rule creation
  • --recursive: Scan directories recursively
  • --excludegood: Force exclusion of all goodware strings
  • --oe: only executable extensions

Database Options

  • --identifier: Database identifier for multi-environment support
  • --update: Update existing databases with new samples
  • --only-executable: Only process executable file extensions

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

TODO's

  • Global project refactoring & packaging
  • Token extraction rewritten in Rust
  • Tests & CI/CD pipeline
  • Multiplatform PyPI release
  • HTTP service with web UI
  • Store regex patterns in configuration
  • Wide/ASCII token merging
  • Token deduplication
  • Fix/improve imphash/exports handling
  • Include default databases
  • Rule generation improvements
  • Separate token extraction to stringZZ package
  • Regexp generation
  • LLM Scoring support

📄 License

This project is licensed under the GPLv3 License - see the LICENSE file for details.

🙏 Credits

  • yarGen by Florian Roth (initial idea and implementation)
  • Pyo3 for Python-Rust integration
  • goblin for binary parsing

📞 Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yarobot-0.4.2.tar.gz (993.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yarobot-0.4.2-py3-none-any.whl (64.8 kB view details)

Uploaded Python 3

File details

Details for the file yarobot-0.4.2.tar.gz.

File metadata

  • Download URL: yarobot-0.4.2.tar.gz
  • Upload date:
  • Size: 993.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for yarobot-0.4.2.tar.gz
Algorithm Hash digest
SHA256 2cf91ef1d142228df0115dcdd5860b6ad33553583f853eeabaaa595b31a886a1
MD5 fa79bfb1dac04ce0a1db1970d9601e83
BLAKE2b-256 53568a62dfdbbfab645c49b874e9f933e511bf79d63bdd9afbde70e453b18711

See more details on using hashes here.

File details

Details for the file yarobot-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: yarobot-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 64.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for yarobot-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1cc51dec88e5dbfe16da85510ebbdbb8244b48f16e7d4ae36d67dd44ce42986e
MD5 1d9cc1787c7bdb51396dc342655f308d
BLAKE2b-256 fa66b001f834b0c83c2882a7acaee01d8a42dbfd88426abc74a6cfe367992e4a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page