Skip to main content

cy_ioc_extract is a Python library designed to extract various Indicators of Compromise (IOCs) from raw text using regular expressions (regex) and optional validation mechanisms. It supports extracting IP addresses, domains, URLs, hashes, emails, registry keys, autonomous system numbers, and more.

Project description

cy_ioc_extract

cy_ioc_extract is a Python library designed to extract various Indicators of Compromise (IOCs) from raw text using regular expressions (regex) and optional validation mechanisms. It supports extracting IP addresses, domains, URLs, hashes, emails, registry keys, autonomous system numbers, and more.

Features

  • Supports extraction of various IOC types like IPs, Domains, URLs, Hashes, CVEs, etc.
  • Option to validate extracted values (e.g., domains are validated against IANA TLDs and OpenNIC TLDs).
  • Allows selective extraction using the extract_fields parameter.
  • Handles false positives by filtering invalid data when validation is enabled.

Installation

pip install cy_ioc_extract

Usage

1️⃣ Extracting Specific IOC Types

from cy_ioc_extract import IOCEXtract

txt = """
### IPv4 Addresses:
192.168.1.1
10.0.0.1
8.8.8.8
172.16.32.45

### Domains:
example.com
subdomain.test.org
my-site.net
web.co.uk

### Email Addresses:
john.doe@example.com
alice_smith@corporate.org
user123@test.net
contact@web.co.uk
"""

# Extract only "DOMAIN" and "EMAIL"
iocs = IOCEXtract(txt, extract_fields=("DOMAIN", "EMAIL")).extract_ioc()
print(iocs)

Output:

{
    "DOMAIN": ["my-site.net", "subdomain.test.org", "web.co.uk", "example.com"],
    "EMAIL": ["contact@web.co.uk", "john.doe@example.com", "alice_smith@corporate.org", "user123@test.net"]
}

2️⃣ Extracting All IOC Types

iocs = IOCEXtract(txt).extract_ioc()
print(iocs)

Output:

{
    "IP": ["172.16.32.45", "10.0.0.1", "192.168.1.1", "8.8.8.8"],
    "DOMAIN": ["subdomain.test.org", "my-site.net", "example.com", "web.co.uk"],
    "EMAIL": [
        "contact@web.co.uk",
        "user123@test.net",
        "alice_smith@corporate.org",
        "john.doe@example.com",
    ],
    "FIND_EMAIL": [
        "contact@web.co.uk",
        "user123@test.net",
        "alice_smith@corporate.org",
        "john.doe@example.com",
    ],
    "CVE": ["CVE-2022-7654321", "CVE-2021-98765", "CVE-2023-1234", "CVE-2019-45678"],
    "URL": [("https://example.com", ""), ("ftp://192.168.1.1/resource", "")],
    "SHA256": [
        "d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2",
        "d3b6f7c8e5a4d9b2c7e0f8b1a6d0c5e9a2b3d4f7e6c1a9b0f2d8c3a5e7b6d9c4",
        "5d41402abc4b2a76b9719d911017c59216dcd8d1a3f32a5e3a0d867d8e448be5",
        "d4eaa4b4e9c3e5d0b5a3c2a7f6b0e9c3d4f2e5a7b6c9f8e0d5c4e3a2b7f0d6c1",
        "46e95f20ad2a7dcd491ee6b0d56e0b7fd4f5e0c19ff2eb6d6bfa6a4c7a5c7e9b",
        "a9b0d6c3e5a4d9f7b2c1e8b3d0c7e6f9a5b4d8c2e3f0a7c6d1e9b0a2f8b5c7d6",
        "e8a2b7d6c5f3e4d0a1c2b7e9f0d6c3a5e8b4f7d9c0a2e3b6c5f1d0a7e2c9b8f0",
        "8a2c7d6b5f3e4d0a1c2b7e9f0d6c3a5e8b4f7d9c0a2e3b6c5f1d0a7e2c9b8f0d",
    ],
    "MD5": [
        "e99a18c428cb38d5f260853678922e03",
        "6f4922f45568161a8cdf4ad2299f6d23",
        "9e107d9d372bb6826bd81d3542a419d6",
        "098f6bcd4621d373cade4e832627b4f6",
    ],
    "SHA1": [
        "da39a3ee5e6b4b0d3255bfef95601890afd80709",
        "2fd4e1c67a2d28fced849ee1bb76e7391b93eb12",
        "9c1185a5c5e9fc54612808977ee8f548b2258d31",
        "a94a8fe5ccb19ba61c4c0873d391e987982fbbd3",
    ],
    "UUID": [
        "3f2504e0-4f89-11d3-9a0c-0305e82c3301",
        "6ba7b810-9dad-11d1-80b4-00c04fd430c8",
        "550e8400-e29b-41d4-a716-446655440000",
        "123e4567-e89b-12d3-a456-426614174000",
    ],
    "IPv4_CIDR": ["192.168.1.0/24", "10.0.0.0/8", "172.16.0.0/16", "8.8.8.0/24"],
    "IPv6": [
        "::1",
        "2001:0db8:85a3:0000:0000:8a2e:0370:7334",
        "2001:db8::ff00:42:8329",
        "fe80::1ff:fe23:4567:890a",
    ],
    "SHA224": [
        "3a7bd3e2360a6c8a1e4c0e5a2b9f7d38c6f7c7c2e3d6a7f0c1b2a5e1",
        "d14a028c2a3a2bc9476102bb288234c415a2b01f828ea62ac5b3e42f",
        "758c57b4a7c8f3f3955b05bbd5e3c61a2cbf6d8fd98f48a263d7653b",
        "c97c24a8b9ac4a0c6e78a9a31a4b6ff8a2e9d5fffe71c3d6629d1a7a",
    ],
    "SHA384": [
        "ca737f0d0c89f6d1d172875e9d10c7c3350c1096c4bdb49f003ee927b4e6db32b08690b279b6c5abf0dcbd4f9d786c0b",
        "cf83e1357eefb8bd62ec7761d6d529b18b94ff7f3d8b3c1d5281fbbf6e6c077bbd7af5d15fa1c20b9a785e6cf0d630da",
    ],
    "SHA512": [],
    "SSDEEP": [],
    "DIRECTORY": [
        "C:\\Users\\Public\\Documents\\",
        "/etc/systemd/system/",
        "/home/user/docs/",
        "/var/log/nginx/",
    ],
    "FILE_PATH": [
        "/home/user/.bashrc",
        "C:\\Program Files\\MyApp\\config.ini",
        "/var/www/html/index.php",
        "D:\\Games\\Game.exe",
    ],
    "AUTONOMOUS_SYSTEM": ["AS24680", "AS12345", "AS67890", "AS13579"],
    "WINDOWS_REGISTRY_KEY": [
        "HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\nHKLM\\SYSTEM\\CurrentControlSet\\Services\\Tcpip\\Parameters\nHKEY_USERS\\.DEFAULT\\Control Panel\\International\nHKCU\\SOFTWARE\\Policies\\Microsoft\\Windows\\System\n\n### Autonomous System Numbers"
    ],
    "MAC_ADDRESS": [
        "52:54:00:12:34:56",
        "00:1A:2B:3C:4D:5E",
        "A1:B2:C3:D4:E5:F6",
        "08:00:27:00:55:AA",
    ],
    "VALID_HOST": [],
    "WHIRLPOOL": [],
    "SHA3512": [],
    "SHA3384": [],
    "SHA3256": [],
    "SHA3224": [],
}

Validation and Custom TLDs

  • When validate_ioc=True, domains are validated against IANA TLDs and custom TLDs (.bbs, .chan, .cyb, etc.).
  • Default value of validate_ioc is True
  • If validation is disabled, regex matches are returned without validation, which may include false positives.
# Extract without validation
iocs = IOCEXtract(txt, extract_fields=("DOMAIN",) validate_ioc=False).extract_ioc()
print(iocs)

Output (Includes False Positives in Domains)

{
    "DOMAIN": ["192.168", "example.com", "subdomain.test.org", "my-site.net", "web.co.uk", "172.16.32.45"]
}

Error Handling

Unsupported Extract Field Error

If an invalid field is passed in extract_fields, an error is raised.

# Will raise UnsupportedExtractFieldError
iocs = IOCEXtract(txt, extract_fields=("DOMAIN", "INVALID_FIELD")).extract_ioc()

Error:

cy_ioc_extract.exception.UnsupportedExtractFieldError: Invalid fields {'INVALID_FIELD'} given to extract. Supported types are {"AUTONOMOUS_SYSTEM", "IPv6", "URL", "SHA384", "DIRECTORY", "IPv4_CIDR", "SHA512", "DOMAIN", "VALID_HOST", "WINDOWS_REGISTRY_KEY", "SSDEEP", "EMAIL", "SHA256", "MAC_ADDRESS", "CVE", "FIND_EMAIL", "IP", "MD5", "SHA3512", "SHA3256", "UUID", "SHA3384", "SHA1", "SHA224", "FILE_PATH", "SHA3224", "WHIRLPOOL"}.

Supported Extract Fields

Set of all supported extract types

{'IPv4_CIDR', 'SHA256', 'MAC_ADDRESS', 'EMAIL', 'MD5', 'SHA3224', 'IPv6', 'SSDEEP', 'DOMAIN', 'DIRECTORY', 'AUTONOMOUS_SYSTEM', 'CVE', 'SHA384', 'VALID_HOST', 'SHA3512', 'SHA3384', 'SHA512', 'SHA1', 'WHIRLPOOL', 'SHA3256', 'URL', 'WINDOWS_REGISTRY_KEY', 'UUID', 'SHA224', 'FIND_EMAIL', 'IP', 'FILE_PATH'}

Field Name Description
IP IPv4 addresses
DOMAIN Domains and subdomains
EMAIL Email addresses
FIND_EMAIL Extract emails from free text
CVE CVE Identifiers (CVE-YYYY-NNNN)
URL HTTP, HTTPS, FTP URLs
SHA256 SHA-256 Hashes
MD5 MD5 Hashes
SHA1 SHA-1 Hashes
UUID Universally Unique Identifiers
IPv4_CIDR IPv4 CIDR notation (e.g., 192.168.1.0/24)
IPv6 IPv6 addresses
SHA224, SHA384, SHA512, SHA3512, SHA3256, SHA3224 Various hash formats
SSDEEP SSDEEP fuzzy hashes
DIRECTORY File system directory paths
FILE_PATH Specific file paths
AUTONOMOUS_SYSTEM ASN numbers (e.g., AS12345)
WINDOWS_REGISTRY_KEY Windows registry keys
MAC_ADDRESS MAC addresses
VALID_HOST Valid hostnames
WHIRLPOOL Whirlpool hashes

License

MIT License.


Contributing

Pull requests are welcome! If you find any issue or want to request a new feature, open an issue in the repository.


Author


Enjoy using cy_ioc_extract for threat intelligence extraction! 🚀

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cy_ioc_extract-0.1.1.tar.gz (20.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cy_ioc_extract-0.1.1-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file cy_ioc_extract-0.1.1.tar.gz.

File metadata

  • Download URL: cy_ioc_extract-0.1.1.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.5

File hashes

Hashes for cy_ioc_extract-0.1.1.tar.gz
Algorithm Hash digest
SHA256 04ebc4312db07e3b325ac6cb30649a6eb24daef3e1bf6e9be4be9fd272fcedaf
MD5 3ab2d10fe04054c50518d79a15f8ca81
BLAKE2b-256 b8066c6c2e9e77b6a3bf86deb4a12f6d6d4d60ebf842943625982dc9fdeab05e

See more details on using hashes here.

File details

Details for the file cy_ioc_extract-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: cy_ioc_extract-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.5

File hashes

Hashes for cy_ioc_extract-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fbd3a9f21a2827aa23710c8dd36e4eb193c12174148205cba37d5536dc721d6f
MD5 4c604fbae23387fe3e7be4651a97163c
BLAKE2b-256 ebd91b1b3f6ab20714242f4a03eb425459c440dcaf9d7d669a070dec27b76b5f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page