Skip to main content

Advanced Indicator of Compromise (IOC) extractor

Project description

Build Status Documentation Status Code Health Test Coverage PyPi Version

Advanced Indicator of Compromise (IOC) extractor.

Overview

This library extracts URLs, IP addresses, MD5/SHA hashes, and YARA rules from text corpora. It includes obfuscated and “defanged” IOCs in the output, and optionally deobfuscates them.

The Problem

It is common practice for malware analysts or endpoint software to “defang” IOCs such as URLs and IP addresses, in order to prevent accidental exposure to live malicious content. Being able to extract and aggregate these IOCs is often valuable for analysts. Unfortunately, existing “IOC extraction” tools often pass right by them, as they are not caught by standard REGEX.

For example, the simple defanging technique of surrounding periods with brackets:

127[.]0[.]0[.]1

Existing tools that use a simple IP address REGEX will ignore this IOC entirely.

The Solution

By combining specially crafted REGEX with some custom postprocessing, we are able to both detect and deobfuscate “defanged” IOCs. This saves time and effort for the analyst, who might otherwise have to manually find and convert IOCs into machine-readable format.

A Simple Use Case

Many Twitter users post C2s or other valuable IOC information with defanged URLs. For example, this tweet from @InQuest:

Recommended reading and great work from @unit42_intel:
https://researchcenter.paloaltonetworks.com/2018/02/unit42-sofacy-attacks-multiple-government-entities/ ...
InQuest customers have had detection for threats delivered from hotfixmsupload[.]com
since 6/3/2017 and cdnverify[.]net since 2/1/18.

If we run this through the extractor, we can easily pull out the URLs:

https://researchcenter.paloaltonetworks.com/2018/02/unit42-sofacy-attacks-multiple-government-entities/
hotfixmsupload[.]com
cdnverify[.]net

Passing in refang=True at extraction time would remove the obfuscation, but since these are real IOCs, let’s leave them defanged in our documentation. :)

Installation

Just get it from pip:

pip install iocextract

Usage

Try extracting some defanged URLS:

>>> content = """
... I really love example[.]com!
... All the bots are on hxxp://example.com/bad/url these days.
... C2: tcp://example[.]com:8989/bad
... """
>>> import iocextract
>>> for url in iocextract.extract_urls(content):
...     print url
...
hxxp://example.com/bad/url
tcp://example[.]com:8989/bad
example[.]com
tcp://example[.]com:8989/bad

Note that some URLs may show up twice if they are caught by multiple REGEXes.

If you want, you can also “refang”, or remove common obfuscation methods from IOCs:

>>> for url in iocextract.extract_urls(content, refang=True):
...     print url
...
http://example.com/bad/url
http://example.com:8989/bad
http://example.com
http://example.com:8989/bad

All extract_* functions in this library return iterators, not lists. The benefit of this behavior is that iocextract can process extremely large inputs, with a very low overhead. However, if for some reason you need to iterate over the IOCs more than once, you will have to save the results as a list:

>>> list(iocextract.extract_urls(content))
['hxxp://example.com/bad/url', 'tcp://example[.]com:8989/bad', 'example[.]com', 'tcp://example[.]com:8989/bad']

A command-line tool is also included:

$ iocextract -h
usage: iocextract [-h] [--input INPUT] [--output OUTPUT] [--extract-ips]
                  [--extract-urls] [--extract-yara-rules] [--extract-hashes]
                  [--refang]

Advanced Indicator of Compromise (IOC) extractor. If no arguments are
specified, the default behavior is to extract all IOCs.

optional arguments:
  -h, --help            show this help message and exit
  --input INPUT         default: stdin
  --output OUTPUT       default: stdout
  --extract-ips
  --extract-urls
  --extract-yara-rules
  --extract-hashes
  --refang              default: no

Only URLs and IPv4 addresses can be “refanged”.

More Details

This library currently supports the following IOCs:

  • IP Addresses
    • IPv4 fully supported

    • IPv6 partially supported

  • URLs
    • With protocol specifier: http, https, tcp, udp, ftp, sftp, ftps

    • With [.] anchor, even with no protocol specifier

    • IPv4 and IPv6 (RFC2732) URLs are supported

  • Emails
    • Partially supported, anchoring on @

  • YARA rules

  • Hashes
    • MD5

    • SHA1

    • SHA256

    • SHA512

For IPv4 addresses, the following defang techniques are supported:

Technique

Defanged

Refanged

. -> [.]

1[.]1[.]1[.]1

1.1.1.1

. -> (.)

1(.)1(.)1(.)1

1.1.1.1

Partial

1[.1[.1.]1

1.1.1.1

Any combination

1.)1[.1.)1

1.1.1.1

For URLs, the following defang techniques are supported:

Technique

Defanged

Refanged

. -> [.]

example[.]com/path

http://example.com/path

. -> (.)

example(.)com/path

http://example.com/path

Partial

http://example[.com/path

http://example.com/path

/ -> [/]

http://example.com[/]path

http://example.com/path

Cisco ESA

http:// example .com /path

http://example.com/path

:// -> __

http__example.com/path

http://example.com/path

hxxp

hxxp://example.com/path

http://example.com/path

Any combination

hxxp__ example( .com[/]path

http://example.com/path

Note that the table above is not exhaustive, and other URL/defang patterns may also be extracted correctly. If you notice something missing or not working correctly, feel free to let us know via the GitHub Issues.

Contributing

If you have a defang technique that doesn’t make it through the extractor, or if you find any bugs, PRs and Issues are always welcome. The library is released under a “BSD-New” (aka “BSD 3-Clause”) license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iocextract-1.0.0.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

iocextract-1.0.0-py2.py3-none-any.whl (11.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file iocextract-1.0.0.tar.gz.

File metadata

  • Download URL: iocextract-1.0.0.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for iocextract-1.0.0.tar.gz
Algorithm Hash digest
SHA256 39519de44ae826958364b79799d2ca0e145c7fea8adb31e12dd6e81a41145581
MD5 1a06bd107491ff5196f512dfee73ce67
BLAKE2b-256 8525cd43002c5af2a80e5ecb8dfdc1bb6104754df477199bdaa9a5e87de34cc7

See more details on using hashes here.

Provenance

File details

Details for the file iocextract-1.0.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for iocextract-1.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 7cdcfd8a7655bdc9a29c773734e62f52dc8fa4d55cbae0251da8dbb27c3e21bf
MD5 21f18d97142da4e4e418a0ed3dfb4388
BLAKE2b-256 48364b7e76f923de9cd038c16aa0a4b7a54c07d5a42d6ff93305010b467614cd

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page