A tool for extracting Indicators of Compromise from security reports

These details have not been verified by PyPI

Project links

Project description

IOCParser

A tool for extracting Indicators of Compromise (IOCs) from security reports in HTML, PDF, and plain text formats.

Author: Marc Rivero | @seifreed
Version: 1.0.1

Features

Extraction of multiple types of IOCs:
- Hashes (MD5, SHA1, SHA256, SHA512)
- Domains
- IP Addresses
- URLs
- Bitcoin addresses
- Email addresses
- Hosts
- CVEs
- Windows Registry entries
- Filenames
- Filepaths
- Yara rules
Automatic defanging of domains and IPs
Support for HTML, PDF, and plain text formats
Support for direct analysis from URLs
Output in JSON and plain text format
Checking against MISP warning lists to identify false positives
Can be used as a command-line tool or as a Python library

Installation

From PyPI (Recommended)

pip install iocparser-tool

From Source

# Clone the repository
git clone https://github.com/seifreed/iocparser.git
cd iocparser

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install as a package with all dependencies
pip install -e .

# Or install just the requirements
pip install -r requirements.txt

Quick Start

# Initialize and download MISP warning lists (do this first)
iocparser --init

# Analyze a PDF file
iocparser -f report.pdf

# Analyze an HTML file
iocparser -f report.html

# Analyze a text file
iocparser -f report.txt

Command Line Usage

Basic Usage

# Initialize and download MISP warning lists (do this first)
iocparser --init

# Analyze a PDF file
iocparser -f report.pdf

# Analyze an HTML file
iocparser -f report.html

# Analyze a text file
iocparser -f report.txt

File Type Options

# Force specific file type (pdf, html, text)
iocparser -f report -t pdf
iocparser -f report -t html
iocparser -f report -t text

Output Options

# Save outputs to a specific file
iocparser -f report.pdf -o results.json
iocparser -f report.pdf -o results.txt

# Print results to screen only
iocparser -f report.pdf -o -

# Use JSON format (default is text)
iocparser -f report.pdf --json

Analyzing from URL

# Analyze a report from a URL
iocparser -u https://example.com/report.html

# Specify content type for a URL
iocparser -u https://example.com/report -t html

Additional Options

--no-defang          Disable automatic defanging of IOCs
--no-check-warnings  Don't check IOCs against MISP warning lists
--force-update       Force update of MISP warning lists
--init               Download and initialize MISP warning lists
-h, --help           Show help message

Using as a Library

You can use IOCParser as a library in your Python projects:

# Example 1: Extract IOCs from a file
from iocparser import extract_iocs_from_file

# Process a file (automatically detects file type)
normal_iocs, warning_iocs = extract_iocs_from_file('path/to/report.pdf')
print(f"Found {len(normal_iocs.get('domains', []))} normal domains")
print(f"Found {len(warning_iocs.get('domains', []))} potential false positive domains")

# With additional options
normal_iocs, warning_iocs = extract_iocs_from_file(
    'path/to/report.html',
    check_warnings=True,      # Check against MISP warning lists
    force_update=False,       # Don't force update MISP lists
    file_type='html',         # Force file type (optional)
    defang=True               # Defang the IOCs
)

# Example 2: Extract IOCs from text content directly
from iocparser import extract_iocs_from_text

text = "This sample malware contacts evil.com with IP 192.168.1.1 and uses hash 5f4dcc3b5aa765d61d8327deb882cf99"
normal_iocs, warning_iocs = extract_iocs_from_text(text)

# Print the extracted IOCs
for ioc_type, iocs_list in normal_iocs.items():
    print(f"{ioc_type}: {iocs_list}")

Using the Low-Level Components

If you need more control, you can use the individual components directly:

from iocparser import IOCExtractor, PDFParser, HTMLParser, MISPWarningLists

# Extract text from a PDF or HTML file
parser = PDFParser("path/to/report.pdf")
# or
# parser = HTMLParser("path/to/report.html")
text_content = parser.extract_text()

# Extract IOCs
extractor = IOCExtractor(defang=True)
iocs = extractor.extract_all(text_content)

# Check against warning lists
warning_lists = MISPWarningLists()
normal_iocs, warning_iocs = warning_lists.separate_iocs_by_warnings(iocs)

Available Extraction Methods

from iocparser import IOCExtractor

extractor = IOCExtractor(defang=True)

# Extract specific IOC types
md5_hashes = extractor.extract_md5(text)
sha1_hashes = extractor.extract_sha1(text)
sha256_hashes = extractor.extract_sha256(text)
sha512_hashes = extractor.extract_sha512(text)
domains = extractor.extract_domains(text)
ips = extractor.extract_ips(text)
urls = extractor.extract_urls(text)
bitcoin = extractor.extract_bitcoin(text)
yara_rules = extractor.extract_yara_rules(text)
hosts = extractor.extract_hosts(text)
emails = extractor.extract_emails(text)
cves = extractor.extract_cves(text)
registry_keys = extractor.extract_registry(text)
filenames = extractor.extract_filenames(text)
filepaths = extractor.extract_filepaths(text)

# Extract all IOC types at once
all_iocs = extractor.extract_all(text)  # Returns a dictionary with all IOCs

Examples

Extract IOCs from a local PDF report

iocparser -f reports/APT28_report.pdf

Extract IOCs from a URL and save in JSON format

iocparser -u https://example.com/security-report.pdf --json

Extract IOCs from an HTML file without defanging

iocparser -f report.html --no-defang

Use in a Python script to process multiple files

from iocparser import extract_iocs_from_file
import os

reports_dir = "path/to/reports"
for filename in os.listdir(reports_dir):
    if filename.endswith(".pdf") or filename.endswith(".html"):
        file_path = os.path.join(reports_dir, filename)
        print(f"Processing {filename}...")
        normal_iocs, warning_iocs = extract_iocs_from_file(file_path)
        
        # Do something with the extracted IOCs
        print(f"Found {sum(len(iocs) for iocs in normal_iocs.values())} IOCs")

License

This project is available under the MIT License. You are free to use, modify, and distribute it, provided that you include the original copyright notice and attribution to the original author.

Required Attribution:

Original Author: Marc Rivero | @seifreed
Repository: https://github.com/seifreed/iocparser

When using this project in your own work, please include a clear reference to the original author and repository.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

6.0.0

May 13, 2026

5.0.1

Jan 20, 2026

5.0.0

Jan 13, 2026

4.0.0

Jan 11, 2026

3.0.0

Aug 21, 2025

This version

2.0.0

Aug 13, 2025

1.0.1

Jul 11, 2025

1.0.0

Jul 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iocparser_tool-2.0.0.tar.gz (19.1 MB view details)

Uploaded Aug 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

iocparser_tool-2.0.0-py3-none-any.whl (19.1 MB view details)

Uploaded Aug 13, 2025 Python 3

File details

Details for the file iocparser_tool-2.0.0.tar.gz.

File metadata

Download URL: iocparser_tool-2.0.0.tar.gz
Upload date: Aug 13, 2025
Size: 19.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for iocparser_tool-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`7f50ae7a2a68572e0d8ca816cc6aaa15f487030a819a6e143630b6a63315bb8d`
MD5	`3ddf3a4acc2642bf96418d695821cd8b`
BLAKE2b-256	`ef26536de1348f7f7d1118a3a1e46e3792eb174e53e3574d275d890199cec818`

See more details on using hashes here.

File details

Details for the file iocparser_tool-2.0.0-py3-none-any.whl.

File metadata

Download URL: iocparser_tool-2.0.0-py3-none-any.whl
Upload date: Aug 13, 2025
Size: 19.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for iocparser_tool-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`13c6586015dfd6badc61b5c69c08eacd614a10eaa74fb452ba43629e6419b427`
MD5	`b67c889990d499638c8e03d9f560115f`
BLAKE2b-256	`7e3957c837038b9f9133dc8d466345e08e22f1a570104f9d3ac6b9a1b13948cb`

See more details on using hashes here.

iocparser-tool 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

IOCParser

Features

Installation

From PyPI (Recommended)

From Source

Quick Start

Command Line Usage

Basic Usage

File Type Options

Output Options

Analyzing from URL

Additional Options

Using as a Library

Using the Low-Level Components

Available Extraction Methods

Examples

Extract IOCs from a local PDF report

Extract IOCs from a URL and save in JSON format

Extract IOCs from an HTML file without defanging

Use in a Python script to process multiple files

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes