Generate YARA rules automatically from positive and negative examples. For PII detection, secret scanning, prompt injection, and any pattern-based detection use case.
Project description
YaraMint
YARA rules from examples, not hand-crafting
YaraMint generates YARA rules from labeled data. Provide a set of adversarial samples and a benign control corpus. It then mines statistically discriminative n-gram patterns, scores them against false positive rate on the control set, and writes the surviving signatures as a standard .yar file. Full algorithm writeup here.
Use Cases
Secret and API key detection — Train on known key formats with benign code as the control set. Get a rule tuned to your specific patterns with minimal false positives.
PII detection in data pipelines — Custom PII formats vary by industry and organization. Generic regex rule sets do not cover internal ID schemes, regional document formats, or domain-specific identifiers. YaraMint learns them from your own examples.
Prompt injection and jailbreak detection — Generate rules from known attack datasets and validate against benign prompt corpora before deploying to your RAG pipeline or agent infrastructure.
Threat hunting and malware analysis — Given samples from an incident, mint hunting rules to scan your fleet for variants. The positive/negative framing maps directly to the analyst workflow.
Supply chain and compliance scanning — Detect license-incompatible snippets, known vulnerable code patterns, or banned dependencies across large codebases in CI.
Installation
Requires Python 3.13 or higher.
pip install yaramint
Using uv (recommended):
uv pip install yaramint
Getting Started
This example generates a rule set for detecting leaked API keys, using a corpus of benign source code as the control set.
Step 1 — Prepare your benign corpus
If your benign dataset is large, prepare it once and reuse it across rule generations:
ymint prepare ./data/source_code_corpus.jsonl \
--adapter jsonl \
--output ./data/benign_code.jsonl
Step 2 — Generate rules
Point yaramint at your positive examples (known API key formats) and the prepared benign control set:
ymint generate ./data/api_keys.jsonl \
--adversarial-adapter jsonl \
--benign-dataset ./data/benign_code.jsonl \
--benign-adapter jsonl \
--output ./data/api_key_rules.yar
Step 3 — Deploy
The output is a standard .yar file. Load it into any YARA engine, your CI pipeline, a pre-commit hook, or a SIEM. No additional runtime required:
yara ./data/api_key_rules.yar ./target_directory/
Optional — Find the best configuration
Run a grid search to find optimal hyperparameters for your dataset before generating production rules:
ymint optimize ./data/api_keys.jsonl \
--benign-dataset ./data/benign_code.jsonl \
--config optimization_config.yaml
The optimizer prints a ready-to-use ymint generate command with the best flags applied.
Commands
ymint prepare
Preprocesses a large benign dataset for efficient reuse. Run once, reference in every subsequent generate call. Accepts local files or Hugging Face datasets:
ymint prepare bigcode/the-stack-smol \
--adapter huggingface \
--output ./data/benign_code.jsonl
ymint generate
The main command. Mines discriminative patterns from your adversarial examples, validates them against the benign control set, and writes a YARA rule file:
ymint generate ./data/pii_examples.jsonl \
--adversarial-adapter jsonl \
--benign-dataset ./data/benign_text.jsonl \
--benign-adapter jsonl \
--engine ngram \
--output ./data/pii_rules.yar
Tune sensitivity with the --set flag:
ymint generate ./data/pii_examples.jsonl \
--benign-dataset ./data/benign_text.jsonl \
--set engine.score_threshold=0.9 \
--output ./data/pii_rules.yar
Iterating on existing rules? Skip patterns already covered:
ymint generate ./data/new_samples.jsonl \
--benign-dataset ./data/benign_text.jsonl \
--existing-rules ./data/baseline.yar \
--output ./data/updated_rules.yar
ymint optimize
Runs a hyperparameter grid search and outputs the best ymint generate command for your dataset. Use this before generating production rules on a new dataset:
ymint optimize ./data/samples.jsonl \
--benign-dataset ./data/benign_text.jsonl \
--config optimization_config.yaml
Output and Compatibility
yaramint produces standard .yar files that:
- Work with any YARA-compatible engine
- Integrate natively with VirusTotal, most SIEMs, EDRs, osquery, and Velociraptor
- Are human-readable, auditable, and version-controllable like any other code
- Require no proprietary runtime to deploy
Further Reading
- User Guide — full configuration reference, adapter options, dot-notation overrides, and engine tuning
- Algorithm and design — how the pattern mining engine works
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yaramint-0.1.7.tar.gz.
File metadata
- Download URL: yaramint-0.1.7.tar.gz
- Upload date:
- Size: 189.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f20581d94c2187caba625dd786cc26ddc4f9ab1f685c6f14165f6ecc03c934df
|
|
| MD5 |
ccf9c4697989b6e9b095a7afad220e43
|
|
| BLAKE2b-256 |
b779beda2f6170c67ad36ba3aa8ccbf0710c08ecb6973e1df8f0bf9a645e646c
|
Provenance
The following attestation bundles were made for yaramint-0.1.7.tar.gz:
Publisher:
release.yml on deconvolute-labs/yaramint
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yaramint-0.1.7.tar.gz -
Subject digest:
f20581d94c2187caba625dd786cc26ddc4f9ab1f685c6f14165f6ecc03c934df - Sigstore transparency entry: 1601915673
- Sigstore integration time:
-
Permalink:
deconvolute-labs/yaramint@8314ee552c29a8b40591a9520ef0ea76780f19c3 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/deconvolute-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8314ee552c29a8b40591a9520ef0ea76780f19c3 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file yaramint-0.1.7-py3-none-any.whl.
File metadata
- Download URL: yaramint-0.1.7-py3-none-any.whl
- Upload date:
- Size: 53.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
42afdaf61755e2dab04420c579575ccbb01858571b4cc1979300b1274a266a1c
|
|
| MD5 |
aa4cd9f2b135f92476c8832bff0e790f
|
|
| BLAKE2b-256 |
28c70468ef3bdbd0c0b5b24f9f4da76b7dda7fd411b1f8486793dc7d0f75dee2
|
Provenance
The following attestation bundles were made for yaramint-0.1.7-py3-none-any.whl:
Publisher:
release.yml on deconvolute-labs/yaramint
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yaramint-0.1.7-py3-none-any.whl -
Subject digest:
42afdaf61755e2dab04420c579575ccbb01858571b4cc1979300b1274a266a1c - Sigstore transparency entry: 1601915679
- Sigstore integration time:
-
Permalink:
deconvolute-labs/yaramint@8314ee552c29a8b40591a9520ef0ea76780f19c3 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/deconvolute-labs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@8314ee552c29a8b40591a9520ef0ea76780f19c3 -
Trigger Event:
workflow_dispatch
-
Statement type: