Skip to main content

Extract Indicators of Compromise (IOCs) from unstructured text.

Project description

text2ioc banner

text2ioc

text2ioc extracts Indicators of Compromise (IoCs) from unstructured text such as articles, reports, logs, and threat-intelligence notes.

Disclaimer text2ioc is a deterministic pattern-extraction package, not a threat-intelligence validation engine. It combines regex matching with heuristic post-filtering, so many returned values are best understood as candidate IoC-like patterns rather than strict, confirmed IoCs.

Install from PyPI:

pip install text2ioc

Usage

import json

from text2ioc.ioc import extract_iocs

text = (
    "Download https://dpaste[.]com/9MQEJ6VYR.txt from 77.221.158[.]154, "
    "contact ops[at]example.org, and review T1059.001 linked to TA0002."
)

iocs = extract_iocs(text)
print(json.dumps(iocs, indent=2))

Expected output:

{
  "filepath": [],
  "file": [],
  "url": [
    "https://dpaste[.]com/9MQEJ6VYR.txt"
  ],
  "domain": [],
  "email": [
    "ops[at]example.org"
  ],
  "ipv4": [
    "77.221.158[.]154"
  ],
  "ipv6": [],
  "md5": [],
  "sha1": [],
  "sha256": [],
  "cve": [],
  "expressions": [],
  "attack_technique_id": [
    "T1059.001"
  ],
  "attack_tactic_id": [
    "TA0002"
  ],
  "registry_key": [],
  "cwe": [],
  "ghsa": [],
  "capec": []
}

Field Semantics

The extractor is regex-first, then removes false positives with explicit heuristics. It does not resolve domains, validate reachability, or decide whether an indicator is malicious. It only returns strings that match the current parsing rules.

  • filepath: Unix, Windows, UNC, and relative paths. It trims trailing punctuation, keeps quoted paths, and discards bare basenames without extensions, unlikely Linux roots, and slash-prefixed strings that do not appear in a path-like context.
  • file: File names and suspicious extensions, including defanged forms like cmd[dot]exe. It excludes obvious domains, version-like tokens, e.g / i.e, and many-segment dotted identifiers that look more like namespaces than files.
  • url: URLs with an explicit scheme, optional port, and optional path. It accepts normal and defanged separators, plus IPv4 hosts. It does not keep malformed schemes, malformed ports, or plain hostnames without a scheme.
  • domain: Plain domains, subdomains, wildcard domains, defanged domains, and .onion addresses. It excludes items with invalid or unsupported TLDs, file extensions, README.md, permission-style names, reverse-domain identifiers, EC2 shapes, Azure namespaces, ANY.RUN, code symbols like EndpointRequest.to(), markup/CMS fragments, and legal-entity strings like Co.LTD when the surrounding context looks organizational rather than web-related.
  • email: Standard and defanged email addresses. The domain part must end in a valid TLD and must not look like a file extension. Domain-like fragments that are only part of an email are intentionally not duplicated under domain.
  • ipv4: Standard and defanged IPv4 addresses. It excludes invalid octets, partial quads, and version-like quads in advisory/product contexts, including cases with leading-zero octets such as 16.03.08.12 or parenthetical build suffixes such as 1.2.0.14(408).
  • ipv6: Standard and compressed IPv6 forms such as ::1 and 2001:db8::1. It excludes malformed addresses with invalid hex groups or invalid double compression.
  • md5: Exactly 32 hexadecimal characters.
  • sha1: Exactly 40 hexadecimal characters.
  • sha256: Exactly 64 hexadecimal characters.
  • cve: Tokens matching CVE-YYYY-NNNN....
  • expressions: Template-like expressions in ${...} form.
  • attack_technique_id: MITRE ATT&CK technique IDs such as T1059 or T1059.001. It does not synthesize IDs from ATT&CK URL paths like /T1059/001/.
  • attack_tactic_id: MITRE ATT&CK tactic IDs such as TA0001.
  • registry_key: Windows registry paths rooted in a known hive such as HKLM, HKCU, or HKEY_LOCAL_MACHINE, with one or more subkeys. It avoids swallowing trailing command arguments.
  • cwe: Tokens matching CWE-N.
  • ghsa: GitHub advisory IDs such as GHSA-v63m-x9r9-8gqp.
  • capec: Tokens matching CAPEC-N.

General filtering that is already codified today:

  • Results are deduplicated, and shorter matches that are fully contained inside longer ones are dropped.
  • The extractor uses the Public Suffix List for domain and email TLD validation, with a built-in fallback set when the suffix list cannot be fetched or parsed.
  • For domain, if mixed defanged/plain candidates appear together, the code keeps the explicitly domain-like ones and avoids re-emitting noisy fragments.

Support & Connect

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text2ioc-0.1.5.tar.gz (2.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

text2ioc-0.1.5-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl (981.9 kB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

text2ioc-0.1.5-pp310-pypy310_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (940.8 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARMv7l

text2ioc-0.1.5-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl (983.7 kB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

text2ioc-0.1.5-pp39-pypy39_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (942.5 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARMv7l

text2ioc-0.1.5-cp310-abi3-win_amd64.whl (769.2 kB view details)

Uploaded CPython 3.10+Windows x86-64

text2ioc-0.1.5-cp310-abi3-manylinux_2_28_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

text2ioc-0.1.5-cp310-abi3-manylinux_2_28_aarch64.whl (981.5 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ ARM64

text2ioc-0.1.5-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (940.4 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARMv7l

text2ioc-0.1.5-cp310-abi3-macosx_11_0_arm64.whl (864.5 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

text2ioc-0.1.5-cp310-abi3-macosx_10_12_x86_64.whl (911.5 kB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file text2ioc-0.1.5.tar.gz.

File metadata

  • Download URL: text2ioc-0.1.5.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for text2ioc-0.1.5.tar.gz
Algorithm Hash digest
SHA256 5182acc94f01b1eebc02d6d6e8e337d7a6c91b53bd8be1c981f2f8836a513b97
MD5 1b54185a4efd9ab173104928d4155bca
BLAKE2b-256 782f46704e3f5f96bec6aa7581af8bd05677d5e555f7a21774f17a01e5459dff

See more details on using hashes here.

Provenance

The following attestation bundles were made for text2ioc-0.1.5.tar.gz:

Publisher: wheels.yml on juanmcristobal/text2ioc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file text2ioc-0.1.5-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for text2ioc-0.1.5-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 d7015a979dd78b13038af1ad61b5fdb83dc7692f165eafd38349cf63a3695a11
MD5 d1f304fdb1cd4c8d3331073ff43a258c
BLAKE2b-256 b3f6101a3c238a303139ef395f55cadbd687d2ed03c015f84b72412e506f0d07

See more details on using hashes here.

Provenance

The following attestation bundles were made for text2ioc-0.1.5-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl:

Publisher: wheels.yml on juanmcristobal/text2ioc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file text2ioc-0.1.5-pp310-pypy310_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for text2ioc-0.1.5-pp310-pypy310_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 412a89913ecb9fd43f23bd82d387f8f9024ff64e220d25923feee8e44e4bced4
MD5 63559b1ed5980764c1ad7f92b83e3bf3
BLAKE2b-256 ad67b88ce34d29a8f01207859f79437284a906b1b4d4f11b4c97a362192fe388

See more details on using hashes here.

Provenance

The following attestation bundles were made for text2ioc-0.1.5-pp310-pypy310_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl:

Publisher: wheels.yml on juanmcristobal/text2ioc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file text2ioc-0.1.5-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for text2ioc-0.1.5-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0c10ba45147fc99233335b7c55cd374da5c26729c1231200caeb87145db3e5b4
MD5 7c01bff01e64c9204377551d658f6cbb
BLAKE2b-256 7bf59dab189fa23a0a59767d823df48bb5a9fa67a3285a5c89d9322e2e6b4ac2

See more details on using hashes here.

Provenance

The following attestation bundles were made for text2ioc-0.1.5-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl:

Publisher: wheels.yml on juanmcristobal/text2ioc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file text2ioc-0.1.5-pp39-pypy39_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for text2ioc-0.1.5-pp39-pypy39_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 47aedc743fc43fbd38e406dc4aa193d79a888248b6044f1b645c3f85363ae4c3
MD5 3b42f62922f5a35de53d99959b52a8dd
BLAKE2b-256 aed6985f1e01a4f6189d81d4fbbecaf5aa7f0d27398a8bf9ff7d712bf9e683ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for text2ioc-0.1.5-pp39-pypy39_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl:

Publisher: wheels.yml on juanmcristobal/text2ioc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file text2ioc-0.1.5-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: text2ioc-0.1.5-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 769.2 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for text2ioc-0.1.5-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 9e5b73aab42895f84606c694551d6b1a09f8aa25c7f5acb51017ee797f3097b8
MD5 c58f1d73b4485ce0c7eaec04288c8b46
BLAKE2b-256 c71d358df6b92d83c28a58d8b55f0dc246db8dc398d69ded8be756feeab2954d

See more details on using hashes here.

Provenance

The following attestation bundles were made for text2ioc-0.1.5-cp310-abi3-win_amd64.whl:

Publisher: wheels.yml on juanmcristobal/text2ioc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file text2ioc-0.1.5-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for text2ioc-0.1.5-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9a8d1f4dc6861c3d1dce029362f939b6a83b961f6d026ac2311043553fe5ba63
MD5 4be2ca697918a019a1993289bc71c3ff
BLAKE2b-256 c5b461d8a0ebfc24823d244d4c696c4155c07d3f2de5661e2b57f4b01a50082d

See more details on using hashes here.

Provenance

The following attestation bundles were made for text2ioc-0.1.5-cp310-abi3-manylinux_2_28_x86_64.whl:

Publisher: wheels.yml on juanmcristobal/text2ioc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file text2ioc-0.1.5-cp310-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for text2ioc-0.1.5-cp310-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 84ab50f0bef1f06f78a493b32594594fe2fb7086a87fff84e587f69c1b14480d
MD5 a560857032bc1b7b4782d334c24081ee
BLAKE2b-256 0360b383f5232d6b08da6c3810b33b2a33f1ff16e9e9b441d1447f6338a97e03

See more details on using hashes here.

Provenance

The following attestation bundles were made for text2ioc-0.1.5-cp310-abi3-manylinux_2_28_aarch64.whl:

Publisher: wheels.yml on juanmcristobal/text2ioc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file text2ioc-0.1.5-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for text2ioc-0.1.5-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 e052a04e28dd47744d70dce9a37654f5c69d1fa3df44d8f62fe69d281bd0d485
MD5 4e5f4fa85a31e8b314f0dd105776317c
BLAKE2b-256 1fc536638edb097293eee53562223b220787344993cd1277cb200e1c5d8a9e4d

See more details on using hashes here.

Provenance

The following attestation bundles were made for text2ioc-0.1.5-cp310-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl:

Publisher: wheels.yml on juanmcristobal/text2ioc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file text2ioc-0.1.5-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for text2ioc-0.1.5-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 258f11e2f1fde879f223e186eee7c49cda96bc2613970677b4ac1a9d47ac00a6
MD5 ee317259e21c791a22638055079c3c93
BLAKE2b-256 9e3a24baf943654b1c3ad647094f23d5055d44e6e8bf7060a971370173afdff1

See more details on using hashes here.

Provenance

The following attestation bundles were made for text2ioc-0.1.5-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: wheels.yml on juanmcristobal/text2ioc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file text2ioc-0.1.5-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for text2ioc-0.1.5-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4b4bdf649c41fde3807314eb7bd935e67b2f04375b0a3e5be324599c049f42b2
MD5 a2ef4dbba91fee94cdac93f50ea4ab01
BLAKE2b-256 4d36cf7ee29e0e3a2c678b8364cccfec8bf1146a37014fdaa842a982352c9e74

See more details on using hashes here.

Provenance

The following attestation bundles were made for text2ioc-0.1.5-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: wheels.yml on juanmcristobal/text2ioc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page