cy_ioc_extract is a Python library designed to extract various Indicators of Compromise (IOCs) from raw text using regular expressions (regex) and optional validation mechanisms. It supports extracting IP addresses, domains, URLs, hashes, emails, registry keys, autonomous system numbers, and more.
Project description
cy_ioc_extract
cy_ioc_extract is a Python library designed to extract various Indicators of Compromise (IOCs) from raw text using regular expressions (regex) and optional validation mechanisms. It supports extracting IP addresses, domains, URLs, hashes, emails, registry keys, autonomous system numbers, and more.
Features
- Supports extraction of various IOC types like IPs, Domains, URLs, Hashes, CVEs, etc.
- Option to validate extracted values (e.g., domains are validated against IANA TLDs and OpenNIC TLDs).
- Allows selective extraction using the
extract_fieldsparameter. - Handles false positives by filtering invalid data when validation is enabled.
Installation
pip install cy_ioc_extract
Usage
1️⃣ Extracting Specific IOC Types
from cy_ioc_extract import IOCEXtract
txt = """
### IPv4 Addresses:
192.168.1.1
10.0.0.1
8.8.8.8
172.16.32.45
### Domains:
example.com
subdomain.test.org
my-site.net
web.co.uk
### Email Addresses:
john.doe@example.com
alice_smith@corporate.org
user123@test.net
contact@web.co.uk
"""
# Extract only "DOMAIN" and "EMAIL"
iocs = IOCEXtract(txt, extract_fields=("DOMAIN", "EMAIL")).extract_ioc()
print(iocs)
Output:
{
"DOMAIN": ["my-site.net", "subdomain.test.org", "web.co.uk", "example.com"],
"EMAIL": ["contact@web.co.uk", "john.doe@example.com", "alice_smith@corporate.org", "user123@test.net"]
}
2️⃣ Extracting All IOC Types
iocs = IOCEXtract(txt).extract_ioc()
print(iocs)
Output:
{
"IP": ["172.16.32.45", "10.0.0.1", "192.168.1.1", "8.8.8.8"],
"DOMAIN": ["subdomain.test.org", "my-site.net", "example.com", "web.co.uk"],
"EMAIL": [
"contact@web.co.uk",
"user123@test.net",
"alice_smith@corporate.org",
"john.doe@example.com",
],
"FIND_EMAIL": [
"contact@web.co.uk",
"user123@test.net",
"alice_smith@corporate.org",
"john.doe@example.com",
],
"CVE": ["CVE-2022-7654321", "CVE-2021-98765", "CVE-2023-1234", "CVE-2019-45678"],
"URL": [("https://example.com", ""), ("ftp://192.168.1.1/resource", "")],
"SHA256": [
"d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2",
"d3b6f7c8e5a4d9b2c7e0f8b1a6d0c5e9a2b3d4f7e6c1a9b0f2d8c3a5e7b6d9c4",
"5d41402abc4b2a76b9719d911017c59216dcd8d1a3f32a5e3a0d867d8e448be5",
"d4eaa4b4e9c3e5d0b5a3c2a7f6b0e9c3d4f2e5a7b6c9f8e0d5c4e3a2b7f0d6c1",
"46e95f20ad2a7dcd491ee6b0d56e0b7fd4f5e0c19ff2eb6d6bfa6a4c7a5c7e9b",
"a9b0d6c3e5a4d9f7b2c1e8b3d0c7e6f9a5b4d8c2e3f0a7c6d1e9b0a2f8b5c7d6",
"e8a2b7d6c5f3e4d0a1c2b7e9f0d6c3a5e8b4f7d9c0a2e3b6c5f1d0a7e2c9b8f0",
"8a2c7d6b5f3e4d0a1c2b7e9f0d6c3a5e8b4f7d9c0a2e3b6c5f1d0a7e2c9b8f0d",
],
"MD5": [
"e99a18c428cb38d5f260853678922e03",
"6f4922f45568161a8cdf4ad2299f6d23",
"9e107d9d372bb6826bd81d3542a419d6",
"098f6bcd4621d373cade4e832627b4f6",
],
"SHA1": [
"da39a3ee5e6b4b0d3255bfef95601890afd80709",
"2fd4e1c67a2d28fced849ee1bb76e7391b93eb12",
"9c1185a5c5e9fc54612808977ee8f548b2258d31",
"a94a8fe5ccb19ba61c4c0873d391e987982fbbd3",
],
"UUID": [
"3f2504e0-4f89-11d3-9a0c-0305e82c3301",
"6ba7b810-9dad-11d1-80b4-00c04fd430c8",
"550e8400-e29b-41d4-a716-446655440000",
"123e4567-e89b-12d3-a456-426614174000",
],
"IPv4_CIDR": ["192.168.1.0/24", "10.0.0.0/8", "172.16.0.0/16", "8.8.8.0/24"],
"IPv6": [
"::1",
"2001:0db8:85a3:0000:0000:8a2e:0370:7334",
"2001:db8::ff00:42:8329",
"fe80::1ff:fe23:4567:890a",
],
"SHA224": [
"3a7bd3e2360a6c8a1e4c0e5a2b9f7d38c6f7c7c2e3d6a7f0c1b2a5e1",
"d14a028c2a3a2bc9476102bb288234c415a2b01f828ea62ac5b3e42f",
"758c57b4a7c8f3f3955b05bbd5e3c61a2cbf6d8fd98f48a263d7653b",
"c97c24a8b9ac4a0c6e78a9a31a4b6ff8a2e9d5fffe71c3d6629d1a7a",
],
"SHA384": [
"ca737f0d0c89f6d1d172875e9d10c7c3350c1096c4bdb49f003ee927b4e6db32b08690b279b6c5abf0dcbd4f9d786c0b",
"cf83e1357eefb8bd62ec7761d6d529b18b94ff7f3d8b3c1d5281fbbf6e6c077bbd7af5d15fa1c20b9a785e6cf0d630da",
],
"SHA512": [],
"SSDEEP": [],
"DIRECTORY": [
"C:\\Users\\Public\\Documents\\",
"/etc/systemd/system/",
"/home/user/docs/",
"/var/log/nginx/",
],
"FILE_PATH": [
"/home/user/.bashrc",
"C:\\Program Files\\MyApp\\config.ini",
"/var/www/html/index.php",
"D:\\Games\\Game.exe",
],
"AUTONOMOUS_SYSTEM": ["AS24680", "AS12345", "AS67890", "AS13579"],
"WINDOWS_REGISTRY_KEY": [
"HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Windows\\CurrentVersion\nHKLM\\SYSTEM\\CurrentControlSet\\Services\\Tcpip\\Parameters\nHKEY_USERS\\.DEFAULT\\Control Panel\\International\nHKCU\\SOFTWARE\\Policies\\Microsoft\\Windows\\System\n\n### Autonomous System Numbers"
],
"MAC_ADDRESS": [
"52:54:00:12:34:56",
"00:1A:2B:3C:4D:5E",
"A1:B2:C3:D4:E5:F6",
"08:00:27:00:55:AA",
],
"VALID_HOST": [],
"WHIRLPOOL": [],
"SHA3512": [],
"SHA3384": [],
"SHA3256": [],
"SHA3224": [],
}
Validation and Custom TLDs
- When
validate_ioc=True, domains are validated against IANA TLDs and custom TLDs (.bbs,.chan,.cyb, etc.). - Default value of
validate_iocisTrue - If validation is disabled, regex matches are returned without validation, which may include false positives.
# Extract without validation
iocs = IOCEXtract(txt, extract_fields=("DOMAIN",) validate_ioc=False).extract_ioc()
print(iocs)
Output (Includes False Positives in Domains)
{
"DOMAIN": ["192.168", "example.com", "subdomain.test.org", "my-site.net", "web.co.uk", "172.16.32.45"]
}
Error Handling
Unsupported Extract Field Error
If an invalid field is passed in extract_fields, an error is raised.
# Will raise UnsupportedExtractFieldError
iocs = IOCEXtract(txt, extract_fields=("DOMAIN", "INVALID_FIELD")).extract_ioc()
Error:
cy_ioc_extract.exception.UnsupportedExtractFieldError: Invalid fields {'INVALID_FIELD'} given to extract. Supported types are {"AUTONOMOUS_SYSTEM", "IPv6", "URL", "SHA384", "DIRECTORY", "IPv4_CIDR", "SHA512", "DOMAIN", "VALID_HOST", "WINDOWS_REGISTRY_KEY", "SSDEEP", "EMAIL", "SHA256", "MAC_ADDRESS", "CVE", "FIND_EMAIL", "IP", "MD5", "SHA3512", "SHA3256", "UUID", "SHA3384", "SHA1", "SHA224", "FILE_PATH", "SHA3224", "WHIRLPOOL"}.
Supported Extract Fields
Set of all supported extract types
{'IPv4_CIDR', 'SHA256', 'MAC_ADDRESS', 'EMAIL', 'MD5', 'SHA3224', 'IPv6', 'SSDEEP', 'DOMAIN', 'DIRECTORY', 'AUTONOMOUS_SYSTEM', 'CVE', 'SHA384', 'VALID_HOST', 'SHA3512', 'SHA3384', 'SHA512', 'SHA1', 'WHIRLPOOL', 'SHA3256', 'URL', 'WINDOWS_REGISTRY_KEY', 'UUID', 'SHA224', 'FIND_EMAIL', 'IP', 'FILE_PATH'}
| Field Name | Description |
|---|---|
IP |
IPv4 addresses |
DOMAIN |
Domains and subdomains |
EMAIL |
Email addresses |
FIND_EMAIL |
Extract emails from free text |
CVE |
CVE Identifiers (CVE-YYYY-NNNN) |
URL |
HTTP, HTTPS, FTP URLs |
SHA256 |
SHA-256 Hashes |
MD5 |
MD5 Hashes |
SHA1 |
SHA-1 Hashes |
UUID |
Universally Unique Identifiers |
IPv4_CIDR |
IPv4 CIDR notation (e.g., 192.168.1.0/24) |
IPv6 |
IPv6 addresses |
SHA224, SHA384, SHA512, SHA3512, SHA3256, SHA3224 |
Various hash formats |
SSDEEP |
SSDEEP fuzzy hashes |
DIRECTORY |
File system directory paths |
FILE_PATH |
Specific file paths |
AUTONOMOUS_SYSTEM |
ASN numbers (e.g., AS12345) |
WINDOWS_REGISTRY_KEY |
Windows registry keys |
MAC_ADDRESS |
MAC addresses |
VALID_HOST |
Valid hostnames |
WHIRLPOOL |
Whirlpool hashes |
License
MIT License.
Contributing
Pull requests are welcome! If you find any issue or want to request a new feature, open an issue in the repository.
Author
- Deepak Kumar
- deepak.kumar@cyware.com
Enjoy using cy_ioc_extract for threat intelligence extraction! 🚀
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cy_ioc_extract-0.1.1.tar.gz.
File metadata
- Download URL: cy_ioc_extract-0.1.1.tar.gz
- Upload date:
- Size: 20.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04ebc4312db07e3b325ac6cb30649a6eb24daef3e1bf6e9be4be9fd272fcedaf
|
|
| MD5 |
3ab2d10fe04054c50518d79a15f8ca81
|
|
| BLAKE2b-256 |
b8066c6c2e9e77b6a3bf86deb4a12f6d6d4d60ebf842943625982dc9fdeab05e
|
File details
Details for the file cy_ioc_extract-0.1.1-py3-none-any.whl.
File metadata
- Download URL: cy_ioc_extract-0.1.1-py3-none-any.whl
- Upload date:
- Size: 17.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbd3a9f21a2827aa23710c8dd36e4eb193c12174148205cba37d5536dc721d6f
|
|
| MD5 |
4c604fbae23387fe3e7be4651a97163c
|
|
| BLAKE2b-256 |
ebd91b1b3f6ab20714242f4a03eb425459c440dcaf9d7d669a070dec27b76b5f
|