Skip to main content

Generate YARA rules automatically from positive and negative examples. For PII detection, secret scanning, prompt injection, and any pattern-based detection use case.

Project description

YaraMint

CI License PyPI version Python Ruff

YARA rules from examples, not hand-crafting

YaraMint generates YARA rules from labeled data. Provide a set of adversarial samples and a benign control corpus. It then mines statistically discriminative n-gram patterns, scores them against false positive rate on the control set, and writes the surviving signatures as a standard .yar file. Full algorithm writeup here.

Use Cases

Secret and API key detection — Train on known key formats with benign code as the control set. Get a rule tuned to your specific patterns with minimal false positives.

PII detection in data pipelines — Custom PII formats vary by industry and organization. Generic regex rule sets do not cover internal ID schemes, regional document formats, or domain-specific identifiers. YaraMint learns them from your own examples.

Prompt injection and jailbreak detection — Generate rules from known attack datasets and validate against benign prompt corpora before deploying to your RAG pipeline or agent infrastructure.

Threat hunting and malware analysis — Given samples from an incident, mint hunting rules to scan your fleet for variants. The positive/negative framing maps directly to the analyst workflow.

Supply chain and compliance scanning — Detect license-incompatible snippets, known vulnerable code patterns, or banned dependencies across large codebases in CI.

Installation

Requires Python 3.13 or higher.

pip install yaramint

Using uv (recommended):

uv pip install yaramint

Getting Started

This example generates a rule set for detecting leaked API keys, using a corpus of benign source code as the control set.

Step 1 — Prepare your benign corpus

If your benign dataset is large, prepare it once and reuse it across rule generations:

ymint prepare ./data/source_code_corpus.jsonl \
  --adapter jsonl \
  --output ./data/benign_code.jsonl

Step 2 — Generate rules

Point yaramint at your positive examples (known API key formats) and the prepared benign control set:

ymint generate ./data/api_keys.jsonl \
  --adversarial-adapter jsonl \
  --benign-dataset ./data/benign_code.jsonl \
  --benign-adapter jsonl \
  --output ./data/api_key_rules.yar

Step 3 — Deploy

The output is a standard .yar file. Load it into any YARA engine, your CI pipeline, a pre-commit hook, or a SIEM. No additional runtime required:

yara ./data/api_key_rules.yar ./target_directory/

Optional — Find the best configuration

Run a grid search to find optimal hyperparameters for your dataset before generating production rules:

ymint optimize ./data/api_keys.jsonl \
  --benign-dataset ./data/benign_code.jsonl \
  --config optimization_config.yaml

The optimizer prints a ready-to-use ymint generate command with the best flags applied.

Commands

ymint prepare

Preprocesses a large benign dataset for efficient reuse. Run once, reference in every subsequent generate call. Accepts local files or Hugging Face datasets:

ymint prepare bigcode/the-stack-smol \
  --adapter huggingface \
  --output ./data/benign_code.jsonl

ymint generate

The main command. Mines discriminative patterns from your adversarial examples, validates them against the benign control set, and writes a YARA rule file:

ymint generate ./data/pii_examples.jsonl \
  --adversarial-adapter jsonl \
  --benign-dataset ./data/benign_text.jsonl \
  --benign-adapter jsonl \
  --engine ngram \
  --output ./data/pii_rules.yar

Tune sensitivity with the --set flag:

ymint generate ./data/pii_examples.jsonl \
  --benign-dataset ./data/benign_text.jsonl \
  --set engine.score_threshold=0.9 \
  --output ./data/pii_rules.yar

Iterating on existing rules? Skip patterns already covered:

ymint generate ./data/new_samples.jsonl \
  --benign-dataset ./data/benign_text.jsonl \
  --existing-rules ./data/baseline.yar \
  --output ./data/updated_rules.yar

ymint optimize

Runs a hyperparameter grid search and outputs the best ymint generate command for your dataset. Use this before generating production rules on a new dataset:

ymint optimize ./data/samples.jsonl \
  --benign-dataset ./data/benign_text.jsonl \
  --config optimization_config.yaml

Output and Compatibility

yaramint produces standard .yar files that:

  • Work with any YARA-compatible engine
  • Integrate natively with VirusTotal, most SIEMs, EDRs, osquery, and Velociraptor
  • Are human-readable, auditable, and version-controllable like any other code
  • Require no proprietary runtime to deploy

Further Reading

  • User Guide — full configuration reference, adapter options, dot-notation overrides, and engine tuning
  • Algorithm and design — how the pattern mining engine works

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yaramint-0.1.7.tar.gz (189.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yaramint-0.1.7-py3-none-any.whl (53.3 kB view details)

Uploaded Python 3

File details

Details for the file yaramint-0.1.7.tar.gz.

File metadata

  • Download URL: yaramint-0.1.7.tar.gz
  • Upload date:
  • Size: 189.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yaramint-0.1.7.tar.gz
Algorithm Hash digest
SHA256 f20581d94c2187caba625dd786cc26ddc4f9ab1f685c6f14165f6ecc03c934df
MD5 ccf9c4697989b6e9b095a7afad220e43
BLAKE2b-256 b779beda2f6170c67ad36ba3aa8ccbf0710c08ecb6973e1df8f0bf9a645e646c

See more details on using hashes here.

Provenance

The following attestation bundles were made for yaramint-0.1.7.tar.gz:

Publisher: release.yml on deconvolute-labs/yaramint

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yaramint-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: yaramint-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 53.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yaramint-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 42afdaf61755e2dab04420c579575ccbb01858571b4cc1979300b1274a266a1c
MD5 aa4cd9f2b135f92476c8832bff0e790f
BLAKE2b-256 28c70468ef3bdbd0c0b5b24f9f4da76b7dda7fd411b1f8486793dc7d0f75dee2

See more details on using hashes here.

Provenance

The following attestation bundles were made for yaramint-0.1.7-py3-none-any.whl:

Publisher: release.yml on deconvolute-labs/yaramint

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page