Static IOC extraction engine for binaries, text, and logs.
Project description
malx‑ioc‑extractor
Static IOC extraction for binaries, text, and artifacts — fast, safe, and open‑source.
malx‑ioc‑extractor is a lightweight, extensible engine for extracting Indicators of Compromise (IOCs) using pure static analysis. No execution. No sandboxing. No risk. Built for DFIR workflows, SOC automation, and large‑scale threat analysis.
It’s designed to be:
- Safe — never executes untrusted code
- Fast — built for automation and pipelines
- Extensible — plug in your own regexes, parsers, and rules
- Developer‑friendly — clean API, CLI, and examples
- Open‑source — the extraction engine is free; enrichment lives in the MalX cloud platform
This project is the foundation of the MalX Labs ecosystem for scalable, modern threat‑analysis tooling.
Why malx‑ioc‑extractor?
malx‑ioc‑extractor is designed for environments where safety, determinism, and automation matter. While many IOC extractors operate only on raw text, malx‑ioc‑extractor includes binary‑aware static analysis and an extensible rule system, making it suitable for DFIR pipelines, CI systems, and high‑volume threat‑intel processing.
Key advantages
- Static‑only design — no execution, no sandboxing, and no risk of running untrusted code
- Binary parsing — extracts indicators from Windows PE files in addition to raw text
- Deterministic behaviour — stable output and predictable performance, ideal for automated workflows
- Extensible rule engine — plug in custom detectors, parsers, and enrichment logic
- Consistent JSON schema — uniform output that integrates cleanly with SIEM, SOAR, and log pipelines
- Low dependency footprint — minimal attack surface and safe for enterprise environments
- Designed for pipelines — fast start‑up, fast throughput, and no heavyweight runtime requirements
Use Cases
malx‑ioc‑extractor fits naturally into DFIR, security automation, and threat‑intelligence workflows. Typical usage patterns include:
SOC & Incident Response
- Extract indicators from suspicious emails, alerts, or analyst clipboard text
- Parse IOCs from incident reports and triage notes into structured JSON
- Safely inspect malware samples statically without executing anything
Threat Intelligence Processing
- Normalize indicators from threat‑intel feeds
- Batch‑process dumps of unstructured text into machine‑readable IOC sets
- Build enrichment pipelines on top of the deterministic output format
CI/CD & DevSecOps
- Scan new binaries for embedded indicators before publishing artifacts
- Integrate IOC extraction into automated security checks
- Detect accidental inclusion of URLs or addresses during build steps
Bulk Automation & Scripting
- Pipe logs, artifacts, or telemetry through malx‑ioc‑extractor to extract actionable indicators
- Use the Python API for batch workflows, ETL pipelines, or custom tooling
- Combine with rule extensions to tailor detection to internal patterns or datasets
v0.2.0 — High‑Reliability IP Detection in Hostile Data
Version 0.2.0 significantly improves IPv4/IPv6 extraction in noisy, malformed, mixed-content environments — the kind often seen in:
- SIEM log lines
- network captures
- DFIR corpus samples
- pasted analyst dumps
Real CLI Output (Chaos Corpus Sample)
$ iocx chaos_corpus.json
{
"file": "examples/samples/structured/chaos_corpus.json",
"type": "text",
"iocs": {
"urls": [
"http://[2001:db8::1]:443"
],
"domains": [],
"ips": [
"2001:db8::1",
"2001:db8::1:443",
"10.0.0.1",
"192.168.1.10",
"fe80::dead:beef%eth0",
"1.2.3.4",
"fe80::1%eth0",
"192.168.1.110",
"fe80::1%eth0fe80",
"::2%eth1",
"2001:db8::"
],
"hashes": [],
"emails": [],
"filepaths": [],
"base64": []
},
"metadata": {}
}
Chaos Corpus: Input → Extracted Output → Explanation
| Input | Extracted Output | Explanation |
|---|---|---|
| fe80::dead:beef%eth0/garbage | fe80::dead:beef%eth0 | Salvaged valid IPv6, junk ignored. |
| xxx192.168.1.10yyy | 192.168.1.10 | IPv4 inside junk text. |
| DROP:client=10.0.0.1;;;ERR | 10.0.0.1 | IPv4 from noisy log field. |
| [2001:db8::1]::::443 | 2001:db8::1 | IPv6 and IPv6+port extracted. |
| 2001:db8::1:443 | ||
| GET http://[2001:db8::1]:443/index | http://[2001:db8::1]:443 | URL with IPv6 parsed correctly. |
| udp://[fe80::1%eth0]::::53 | fe80::1%eth0 | Concatenated IPv6 split up. |
| 192.168.1.110.0.0.1 | 192.168.1.110 | Combined IP segment salvaged. |
| fe80::1%eth0fe80::2%eth1 | fe80::1%eth0fe80, ::2%eth1 | Concatenated IPv6 split up. |
| 2001:db8::12001:db8::2 | 2001:db8:: | Longest valid IPv6 prefix found. |
| 256.256.256:256 | — | Invalid indicator ignored. |
Performance Benchmarks (v0.2.0)
All measurements from the latest performance suite:
| Sample Type | Time |
|---|---|
| 1 MB mixed‑content sample | 0.0053s |
| Pathological IPv6 blob | 0.0055s |
| 100 KB sample | 0.0006s |
| 300 KB sample | 0.0017s |
| 600 KB sample | 0.0031s |
| 1 MB sample | 0.0055s |
- Throughput: ~200 MB/s
- Worst‑case IPv6 blob: ~0.5 ms
- Linear scaling: almost perfect from 100 KB → 1 MB
Performance Benchmarks (v0.3.0)
All measurements from the latest performance suite:
| Sample Type | Time |
|---|---|
| IP | |
| ============================== | ========== |
| 1 MB mixed‑content sample | 0.0070s |
| Pathological IPv6 blob | 0.0004s |
| 100 KB sample | 0.0008s |
| 300 KB sample | 0.0021s |
| 600 KB sample | 0.0038s |
| 1 MB sample | 0.0068s |
| ------------------------------ | ---------- |
| Filepath | |
| ============================== | ========== |
| 1 MB mixed‑content sample | 0.0040s |
| Pathological deep unix path | 0.0237s |
| 300 KB sample | 0.0011s |
| 600 KB sample | 0.0022s |
| 1000 KB sample | 0.0038s |
| 1500 KB sample | 0.0055s |
| ------------------------------ | ---------- |
| Crypto | |
| ============================== | ========== |
| 1 MB mixed‑content sample | 0.0021s |
| Pathological ETH-like blob | 0.0012s |
| 300 KB sample | 0.0006s |
| 600 KB sample | 0.0012s |
| 1000 KB sample | 0.0020s |
| 1500 KB sample | 0.0031s |
- Throughput: ~200 MB/s
- Worst‑case IPv6 blob: ~0.5 ms
- Worst‑case filepath blob: ~23 ms
- Worst‑case crypto blob: ~1 ms
- Linear scaling: almost perfect from 100 KB → 1 MB
Features
IOC Extraction
- Windows PE files (.exe, .dll)
- Raw text
- Extracted strings from binaries
- Caching for increased performance
Detections
- URLs
- Domains
- IPv4 / IPv6 addresses
- File paths
- Hashes (MD5 / SHA1 / SHA256 / SHA512 / Generic Hex)
- Email addresses
- Base64
- Crypto wallets (Ethereum, Bitcoin) (new in v0.3.0)
Static PE Parsing
- Imports
- Sections
- Resources
- Metadata
Developer‑Friendly
- Clean JSON output
- CLI + Python API
- Modular, extensible rule system
- Minimal dependency footprint
Security‑First
- Zero malware execution
- Safe for untrusted input
- Deterministic behaviour
Why Static Only?
Static analysis ensures safety, determinism, and CI‑friendly operation. No sandboxing, no execution, and no risk of triggering malware behaviour.
Quickstart
Install
pip install iocx
Extract IOCs from a file
iocx suspicious.exe
Extract from text
echo "Visit http://bad.example.com" | iocx -
Extract from a log file
iocx alerts.log
Python API
from iocx import extract
results = extract("suspicious.exe")
print(results)
Show Example JSON Output
{
"file": "suspicious.exe",
"type": "PE",
"iocs": {
"urls": ["http://malicious.example.com"],
"domains": ["malicious.example.com"],
"ips": ["45.77.12.34"],
"hashes": ["d41d8cd98f00b204e9800998ecf8427e"],
"emails": ["attacker@example.com"],
"filepaths": [
"c:\\windows\\system32\\cmd.exe",
"d:\\temp\\payload.bin"
],
"base64": []
},
"metadata" : {
"file_type": "PE",
"imports": [
"KERNEL32.dll",
"msvcrt.dll"
],
"sections": [
".text",
".data",
".rdata",
".pdata",
".xdata",
".bss",
".idata",
".CRT",
".tls",
".reloc"
],
"resource_strings": [
"C:\\Windows\\System32\\cmd.exe",
"\\\\SERVER01\\share\\dropper.exe",
"/home/alice/.config/evil.sh@%APPDATA%\\Microsoft\\Windows\\Start Menu\\Programs\\Startup\\evil.lnk"
]
}
}
Architecture
malx-ioc-extractor/
│
├── examples/ # Sample files + generators
├── tests/ # Unit and integration tests
├── iocx
├── detectors/ # Regex-based IOC detectors
├── parsers/ # PE parsing, string extraction
├── cli/ # Command-line interface
The engine is intentionally modular so components can be extended or replaced easily.
Extending the Engine
You can add custom:
- Regex detectors
- File parsers
- Normalisation logic
Register a custom detector
The second argument is a detector function (a callable that receives the input and returns extracted values):
from iocx.detectors import register_detector
def extract(data):
# custom extraction logic here
return ["wallet123"]
register_detector("crypto_wallet", extract)
Safe Testing (No Malware Required)
All test samples are:
- Synthetic
- Benign
- Publicly safe (EICAR, GTUBE)
- Designed to avoid accidental malware handling
Contributing
We welcome:
- New IOC detectors
- Parser improvements
- Bug reports
- Documentation updates
- Synthetic test samples
See CONTRIBUTING.md for full guidelines.
Security
If you discover a security issue, do not open a GitHub issue. Please follow the instructions in SECURITY.md.
Related Projects (MalX Labs)
- malx-core — foundational primitives
- malx-utils — shared utilities
- malx-sandbox — dynamic analysis environment
- malx-forge — adversarial payload tooling
- malx-archive — research + PoCs
License
Licensed under the MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iocx-0.3.0.tar.gz.
File metadata
- Download URL: iocx-0.3.0.tar.gz
- Upload date:
- Size: 21.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d73200f1cd2b451a8d0db35f76c730e8b214540c0e571b1ca82c2cfc16dda74
|
|
| MD5 |
fc148a3d532030629041d95665281f21
|
|
| BLAKE2b-256 |
3625f5af46326f26a1ab70e8756ee907d570ccf6e705d7723f50df28f303f91a
|
File details
Details for the file iocx-0.3.0-py3-none-any.whl.
File metadata
- Download URL: iocx-0.3.0-py3-none-any.whl
- Upload date:
- Size: 23.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34f5bdd57f752a576a26b3b4c272453843e2fa9ad758764cd5858997f67c0548
|
|
| MD5 |
d7e4516d0c0b9045dae928888c187eb5
|
|
| BLAKE2b-256 |
b7ebb8ab81d8963cdb9df28cc2d34a44efc471b6b96b9f5d930d4aab5aa0638e
|