Skip to main content

Extract threat indicators (IOCs) from unstructured text and enrich them against threat-intel sources (VirusTotal, AbuseIPDB, abuse.ch). A layered, pip-extras toolkit for the IOC lifecycle.

Project description

iocflow

CI PyPI Python License

Pull indicators of compromise out of unstructured text — threat-intel reports, advisories, emails, tickets — in one call. iocflow extracts IPs, domains, URLs, filenames, file hashes, CVEs, MITRE ATT&CK technique IDs, threat actors, and malware families, with the false-positive defenses you'd otherwise write by hand: a Public Suffix List domain validator, benign-domain/IP allowlists, hash de-duplication across MD5/SHA1/SHA256, and re-fanging of defanged IOCs.

from iocflow import extract

text = """
APT28 (a.k.a. Fancy Bear) staged Cobalt Strike from evil-domain[.]ru and
185.220.101.5, dropping install.ps1 (MD5 a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4).
Exploited CVE-2021-44228 via T1190. Contact: ops@evil-domain[.]ru.
"""

entities = extract(text)
print(entities.summary())
# 1 IPs, 1 domains, 1 filenames, 1 hashes, 1 CVEs, 1 emails, 1 threat actors, 1 MITRE techniques

for ind in entities.iter_indicators():
    print(ind.kind, ind.value)
# ip 185.220.101.5
# domain evil-domain.ru
# ...

The defanged evil-domain[.]ru and ops@evil-domain[.]ru are re-fanged automatically; 185.220.101.5 is kept while private/benign IPs are dropped.

Install

pip install iocflow              # core — one dependency (tldextract)
pip install "iocflow[mitre]"     # + a ready-made MITRE ATT&CK malware-name source

What it extracts

extract(text) returns an ExtractedEntities with:

  • ips — public IPv4, excluding private ranges, benign IPs, and version-number-like values
  • domains — validated against the Mozilla Public Suffix List via tldextract
  • urls — both https://… and bare host/path forms (so package-registry paths survive)
  • filenames — suspicious script/executable/macro/archive filenames
  • hashes{"md5": [...], "sha1": [...], "sha256": [...]}, de-duplicated across lengths
  • cvesCVE-YYYY-NNNN+, normalized to uppercase
  • emails
  • mitre_techniquesT1059, T1059.001, …
  • threat_actors (+ threat_actors_enriched) — APT/UNC/FIN/TA/DEV/STORM designators, a curated well-known list, and the "<Name> ransomware" pattern
  • malware_families — populated when you supply a malware-name source (see below)

Each individual extractor is also importable and composable:

from iocflow import extract_ips, extract_hashes, refang_text
extract_ips(refang_text("c2 at 185[.]220[.]101[.]5"))   # ['185.220.101.5']

Pluggable name sources

The core has no external-data dependency. Two enrichment sources are optional and supplied by you, so iocflow drops cleanly into any environment — plug in your own feeds, or use the bundled MITRE extra.

Malware families. Give extract a MalwareNames and it matches families (with alias-to-canonical normalization) behind a three-layer false-positive defense. Build one from your own list, from MITRE-shaped records, or from the optional extra:

from iocflow import extract, MalwareNames

# Your own list:
names = MalwareNames.from_names(["Cobalt Strike", "Emotet", "Qakbot"])
entities = extract(report_text, malware_names=names)

# Or the bundled MITRE ATT&CK source (needs: pip install "iocflow[mitre]"):
from iocflow.mitre import mitre_malware_names
entities = extract(report_text, malware_names=mitre_malware_names())

Threat-actor aliases. Give extract an ActorAliases to match a custom name set and enrich actors with common_name / region / all_names. Without it, actors are still found by pattern and curated list:

from iocflow import extract, ActorAliases

aliases = ActorAliases.from_index({
    "apt28": {"common_name": "APT28", "region": "Russia",
              "all_names": ["Fancy Bear", "Sofacy", "Sednit"]},
})
entities = extract(report_text, actor_aliases=aliases)
entities.threat_actors_enriched[0].region        # "Russia"
entities.threat_actors_enriched[0].aliases_display()  # "Fancy Bear, Sofacy, Sednit"

Command line

iocflow "APT28 used 185.220.101.5 and evil[.]example[.]com"
echo "report text…" | iocflow --json
iocflow --mitre "Emotet dropped Cobalt Strike"     # needs iocflow[mitre]

Layer 2 — enrichment

Take the extracted entities and look every indicator up against threat-intel sources, getting back a normalized verdict per indicator. Install the extra and set the API keys you have:

pip install "iocflow[enrich]"
export IOCFLOW_VT_API_KEY=...          # VirusTotal      (free key)
export IOCFLOW_ABUSEIPDB_API_KEY=...   # AbuseIPDB       (free key)
export IOCFLOW_ABUSECH_API_KEY=...     # abuse.ch        (free Auth-Key)
from iocflow import extract
from iocflow.enrich import enrich

entities = extract(report_text)
report = enrich(entities)              # uses every source whose key is set

print(report.summary())
# 5 indicators across 3 sources, 2 malicious, 1 suspicious

for ind in report.malicious:
    print("malicious:", ind.kind, ind.value, "→", report.verdict_for(ind.kind, ind.value).value)

Each indicator is routed only to the sources that handle its kind (VirusTotal: IPs/domains/URLs/hashes; AbuseIPDB: IPs; abuse.ch: IPs/domains/URLs/hashes via ThreatFox/URLhaus/MalwareBazaar). Lookups fan out over a thread pool. A source with no key is skipped, and a failing lookup becomes an error record rather than crashing the batch — so partial coverage still produces a report.

Verdicts are normalized to MALICIOUS / SUSPICIOUS / BENIGN / UNKNOWN and aggregated worst-wins across sources. You can also pass enrichers explicitly, restrict to certain kinds, or supply a cache:

from iocflow.enrich import enrich, VirusTotalEnricher, MemoryCache

report = enrich(
    entities,
    [VirusTotalEnricher("my-key")],
    kinds={"ip", "domain"},
    cache=MemoryCache(),
)

Bring your own source by implementing the Enricher protocol (name, supports(kind), enrich(kind, value) -> EnrichmentRecord) — or subclass HTTPEnricher to get session handling, rate-limiting, and error-wrapping for free.

Layer 3 — AI commentary

Turn the enrichment report into an analyst-style assessment with an LLM. Install the extra and point it at any OpenAI-compatible endpoint (OpenAI, Azure, or a local server like vLLM / Ollama / LM Studio):

pip install "iocflow[ai]"
export IOCFLOW_LLM_API_KEY=...                       # omit for keyless local servers
export IOCFLOW_LLM_BASE_URL=http://localhost:11434/v1   # default: OpenAI
export IOCFLOW_LLM_MODEL=gpt-4o-mini
from iocflow import extract
from iocflow.enrich import enrich
from iocflow.ai import comment

entities = extract(report_text)
report = enrich(entities)
note = comment(report, entities=entities, text=report_text)

print(note.severity.value, "—", note.summary)
for finding in note.key_findings:
    print(" •", finding)
for action in note.recommendations:
    print(" →", action)

comment() returns a structured Commentary (severity, assessment, key_findings, recommendations) and is hardened against flaky model output:

  • The model is asked for JSON; if it answers with prose or fenced JSON, the text is parsed best-effort, falling back to using it as the narrative.
  • If no model is configured, or a call fails, comment() returns a deterministic assessment built straight from the report — so it always returns a usable result and never raises. The LLM is the primary path; the fallback guarantees the pipeline keeps working without one.

Bring any model by implementing the CommentaryModel protocol (name + complete(system, user, *, json=False) -> str).

Layer 4 — suggested hunts

Turn the indicators into ready-to-run hunt queries for the platforms a SOC actually uses. The deterministic core runs offline — no network, no API keys:

pip install "iocflow[hunt]"   # only the optional LLM path needs the extra
from iocflow import extract
from iocflow.enrich import enrich
from iocflow.hunt import suggest

entities = extract(report_text)
report = enrich(entities)
plan = suggest(report)                 # CrowdStrike CQL, Cortex XQL, Sigma

print(plan.summary())
# 9 hunts across 3 dialects

for hunt in plan.for_dialect("sigma"):
    print(f"# {hunt.title}  [{hunt.severity.value}]")
    print(hunt.query)

For each indicator kind it renders one sweep query per dialect — CrowdStrike CQL (in(RemoteAddressIP4, values=[...])), Cortex XQL (dataset = xdr_data | filter ...), and a complete Sigma rule (with a stable, content-derived id). Values are escaped and de-duplicated; each dialect renders only the indicator kinds it has a real field for, and benign-verdict indicators are skipped by default (include_benign=True to keep them). Restrict output with dialects=["sigma"].

With a model configured (the same IOCFLOW_LLM_* env as Layer 3), suggest() also proposes behavioral hunts — TTP- and anomaly-based ideas that go beyond literal IOC matching:

plan = suggest(report, entities=entities, commentary=note)
behavioral = [h for h in plan.hunts if h.source == "llm"]

The LLM is strictly additive: with no model, or on any model error, you still get the full deterministic plan — suggest() never raises. Add a query language by implementing the Dialect protocol (key, label, supports, render).

Where this is going

iocflow grows in independently-useful layers, each behind its own pip extra. Layer 1 (extraction), Layer 2 (enrichment), Layer 3 (AI commentary), and Layer 4 (suggested hunts) ship today; next is optional perimeter blocking. The pipeline is a clean hand-off chain of stable types: ExtractedEntities (L1) → enrich()EnrichmentReport (L2) → comment()Commentary (L3) → suggest()HuntPlan (L4), each serializable for the next layer.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iocflow-0.4.0.tar.gz (53.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iocflow-0.4.0-py3-none-any.whl (57.3 kB view details)

Uploaded Python 3

File details

Details for the file iocflow-0.4.0.tar.gz.

File metadata

  • Download URL: iocflow-0.4.0.tar.gz
  • Upload date:
  • Size: 53.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for iocflow-0.4.0.tar.gz
Algorithm Hash digest
SHA256 b0e24b21264c35e4315e1fcfef59709d008b1dbeac3d67cb20da44f95979d4d6
MD5 405a3bd356a6cd3b9042e71028231d9b
BLAKE2b-256 66a6085df25db274e5efc4d4472bac2c91e254431423bb55b70a8e711c23c750

See more details on using hashes here.

Provenance

The following attestation bundles were made for iocflow-0.4.0.tar.gz:

Publisher: release.yml on vinayvobbili/iocflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file iocflow-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: iocflow-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 57.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for iocflow-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ad4e1a04a0da80284f47b649d4b5256d675b93008730dd8c4c947750722145e
MD5 51cfcfe967bcde933b3fd4a0740f6d39
BLAKE2b-256 0cce23eefcf9e5cc50c2d594387dac711bfd5ac722abab9117867c15c934dae0

See more details on using hashes here.

Provenance

The following attestation bundles were made for iocflow-0.4.0-py3-none-any.whl:

Publisher: release.yml on vinayvobbili/iocflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page