Skip to main content

Extract threat indicators (IOCs) from unstructured text — IPs, domains, URLs, hashes, CVEs, MITRE techniques, threat actors, and malware families. Layer 1 of an IOC-lifecycle toolkit.

Project description

iocflow

CI PyPI Python License

Pull indicators of compromise out of unstructured text — threat-intel reports, advisories, emails, tickets — in one call. iocflow extracts IPs, domains, URLs, filenames, file hashes, CVEs, MITRE ATT&CK technique IDs, threat actors, and malware families, with the false-positive defenses you'd otherwise write by hand: a Public Suffix List domain validator, benign-domain/IP allowlists, hash de-duplication across MD5/SHA1/SHA256, and re-fanging of defanged IOCs.

from iocflow import extract

text = """
APT28 (a.k.a. Fancy Bear) staged Cobalt Strike from evil-domain[.]ru and
185.220.101.5, dropping install.ps1 (MD5 a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4).
Exploited CVE-2021-44228 via T1190. Contact: ops@evil-domain[.]ru.
"""

entities = extract(text)
print(entities.summary())
# 1 IPs, 1 domains, 1 filenames, 1 hashes, 1 CVEs, 1 emails, 1 threat actors, 1 MITRE techniques

for ind in entities.iter_indicators():
    print(ind.kind, ind.value)
# ip 185.220.101.5
# domain evil-domain.ru
# ...

The defanged evil-domain[.]ru and ops@evil-domain[.]ru are re-fanged automatically; 185.220.101.5 is kept while private/benign IPs are dropped.

Install

pip install iocflow              # core — one dependency (tldextract)
pip install "iocflow[mitre]"     # + a ready-made MITRE ATT&CK malware-name source

What it extracts

extract(text) returns an ExtractedEntities with:

  • ips — public IPv4, excluding private ranges, benign IPs, and version-number-like values
  • domains — validated against the Mozilla Public Suffix List via tldextract
  • urls — both https://… and bare host/path forms (so package-registry paths survive)
  • filenames — suspicious script/executable/macro/archive filenames
  • hashes{"md5": [...], "sha1": [...], "sha256": [...]}, de-duplicated across lengths
  • cvesCVE-YYYY-NNNN+, normalized to uppercase
  • emails
  • mitre_techniquesT1059, T1059.001, …
  • threat_actors (+ threat_actors_enriched) — APT/UNC/FIN/TA/DEV/STORM designators, a curated well-known list, and the "<Name> ransomware" pattern
  • malware_families — populated when you supply a malware-name source (see below)

Each individual extractor is also importable and composable:

from iocflow import extract_ips, extract_hashes, refang_text
extract_ips(refang_text("c2 at 185[.]220[.]101[.]5"))   # ['185.220.101.5']

Pluggable name sources

The core has no external-data dependency. Two enrichment sources are optional and supplied by you, so iocflow drops cleanly into any environment — plug in your own feeds, or use the bundled MITRE extra.

Malware families. Give extract a MalwareNames and it matches families (with alias-to-canonical normalization) behind a three-layer false-positive defense. Build one from your own list, from MITRE-shaped records, or from the optional extra:

from iocflow import extract, MalwareNames

# Your own list:
names = MalwareNames.from_names(["Cobalt Strike", "Emotet", "Qakbot"])
entities = extract(report_text, malware_names=names)

# Or the bundled MITRE ATT&CK source (needs: pip install "iocflow[mitre]"):
from iocflow.mitre import mitre_malware_names
entities = extract(report_text, malware_names=mitre_malware_names())

Threat-actor aliases. Give extract an ActorAliases to match a custom name set and enrich actors with common_name / region / all_names. Without it, actors are still found by pattern and curated list:

from iocflow import extract, ActorAliases

aliases = ActorAliases.from_index({
    "apt28": {"common_name": "APT28", "region": "Russia",
              "all_names": ["Fancy Bear", "Sofacy", "Sednit"]},
})
entities = extract(report_text, actor_aliases=aliases)
entities.threat_actors_enriched[0].region        # "Russia"
entities.threat_actors_enriched[0].aliases_display()  # "Fancy Bear, Sofacy, Sednit"

Command line

iocflow "APT28 used 185.220.101.5 and evil[.]example[.]com"
echo "report text…" | iocflow --json
iocflow --mitre "Emotet dropped Cobalt Strike"     # needs iocflow[mitre]

Where this is going

iocflow is Layer 1 of an IOC-lifecycle toolkit. The plan is to grow it in independently-useful layers, each behind its own pip extra: enrichment (VirusTotal, Recorded Future, AbuseIPDB, Shodan, abuse.ch), AI commentary, suggested hunts, and optional perimeter blocking — each configured by plugging in your own API keys. ExtractedEntities (and its iter_indicators() view) is the stable hand-off type those layers consume.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iocflow-0.1.0.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iocflow-0.1.0-py3-none-any.whl (24.3 kB view details)

Uploaded Python 3

File details

Details for the file iocflow-0.1.0.tar.gz.

File metadata

  • Download URL: iocflow-0.1.0.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for iocflow-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e60fd5c935db9a8f76157da33bd525711d52e07f183c85c10bbcf6f9c8b11a2c
MD5 90d3c6b535d57b5d99e40fdab752e67d
BLAKE2b-256 5971794ac8b62028e79263863262be4880e579e45e88a016903d75a90d093069

See more details on using hashes here.

Provenance

The following attestation bundles were made for iocflow-0.1.0.tar.gz:

Publisher: release.yml on vinayvobbili/iocflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file iocflow-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: iocflow-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 24.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for iocflow-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3be671f44dbe934fed20349a08b0c2356bd36fb60bf8531649d210f3b594a6e5
MD5 254e3efc1b0035f2855e4bc61d260184
BLAKE2b-256 313d1335efbf03d7d3d57650d5c4ab28acea27128ba7b7ba9c8caa3e2912e299

See more details on using hashes here.

Provenance

The following attestation bundles were made for iocflow-0.1.0-py3-none-any.whl:

Publisher: release.yml on vinayvobbili/iocflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page