A tool for extracting Indicators of Compromise from security reports
Project description
IOCParser
Extract Indicators of Compromise from security reports with ease
Overview
IOCParser is a powerful Python tool for extracting Indicators of Compromise (IOCs) from security reports. It supports HTML, PDF, and plain text formats, making it ideal for threat intelligence analysts, security researchers, and incident responders.
Key Features
| Feature | Description |
|---|---|
| Multi-format Support | Parse PDF, HTML, and plain text files |
| URL Analysis | Extract IOCs directly from web URLs |
| MISP Integration | Filter false positives using MISP warning lists |
| Defanging | Automatic defanging of domains and IPs |
| Library Mode | Use as CLI tool or Python library |
| JSON/Text Output | Flexible output formats |
Supported IOC Types
Hashes MD5, SHA1, SHA256, SHA512
Network Domains, IPs, URLs, Emails
Cryptocurrency Bitcoin addresses
Vulnerabilities CVEs
Windows Registry keys, Filepaths, Filenames
Detection YARA rules
Installation
From PyPI (Recommended)
pip install iocparser-tool
From Source
git clone https://github.com/seifreed/iocparser.git
cd iocparser
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -e .
Quick Start
# Initialize MISP warning lists (first time only)
iocparser --init
# Analyze files
iocparser -f report.pdf
iocparser -f report.html
iocparser -u https://example.com/report.html
Usage
Command Line Interface
# Basic analysis
iocparser -f report.pdf
# Save output to file
iocparser -f report.pdf -o results.json
# JSON format output
iocparser -f report.pdf --json
# Analyze from URL
iocparser -u https://example.com/report.html
# Force specific file type
iocparser -f report -t pdf
Available Options
| Option | Description |
|---|---|
-f, --file |
Input file path |
-u, --url |
URL to analyze |
-o, --output |
Output file path |
-t, --type |
Force file type (pdf, html, text) |
--json |
Output in JSON format |
--no-defang |
Disable IOC defanging |
--no-check-warnings |
Skip MISP warning list check |
--force-update |
Force update MISP lists |
--init |
Initialize MISP warning lists |
Python Library
Basic Usage
from iocparser import extract_iocs_from_file, extract_iocs_from_text
# From file
normal_iocs, warning_iocs = extract_iocs_from_file('report.pdf')
# From text
text = "Malware contacts evil.com at 192.168.1.1"
normal_iocs, warning_iocs = extract_iocs_from_text(text)
# Print results
for ioc_type, iocs in normal_iocs.items():
print(f"{ioc_type}: {iocs}")
Advanced Usage
from iocparser import IOCExtractor, PDFParser, MISPWarningLists
# Extract text from PDF
parser = PDFParser("report.pdf")
text = parser.extract_text()
# Extract IOCs
extractor = IOCExtractor(defang=True)
iocs = extractor.extract_all(text)
# Filter with MISP warning lists
warning_lists = MISPWarningLists()
normal, warnings = warning_lists.separate_iocs_by_warnings(iocs)
Individual Extractors
extractor = IOCExtractor(defang=True)
# Extract specific types
hashes_md5 = extractor.extract_md5(text)
hashes_sha256 = extractor.extract_sha256(text)
domains = extractor.extract_domains(text)
ips = extractor.extract_ips(text)
urls = extractor.extract_urls(text)
emails = extractor.extract_emails(text)
cves = extractor.extract_cves(text)
yara = extractor.extract_yara_rules(text)
registry = extractor.extract_registry(text)
Examples
Process Multiple Reports
from iocparser import extract_iocs_from_file
from pathlib import Path
reports_dir = Path("reports")
for report in reports_dir.glob("*.pdf"):
normal, warnings = extract_iocs_from_file(report)
total = sum(len(v) for v in normal.values())
print(f"{report.name}: {total} IOCs found")
Export to JSON
iocparser -f apt_report.pdf --json -o iocs.json
Analyze Threat Intelligence Feed
iocparser -u https://securelist.com/report.html --json
Requirements
- Python 3.10+
- See pyproject.toml for full dependency list
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Support the Project
If you find IOCParser useful, consider supporting its development:
License
This project is licensed under the MIT License - see the LICENSE file for details.
Attribution Required:
- Author: Marc Rivero | @seifreed
- Repository: github.com/seifreed/iocparser
Made with dedication for the threat intelligence community
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iocparser_tool-4.0.0.tar.gz.
File metadata
- Download URL: iocparser_tool-4.0.0.tar.gz
- Upload date:
- Size: 3.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a7397d436df796bded8e99347cca35dc71bb6b479cc07abf85adecd33badfac
|
|
| MD5 |
3efc8e9f0e46eec96aa7df72c20811a1
|
|
| BLAKE2b-256 |
7c278af65b8b00aa8fef03849be464a52719c62e27502466a32ea261fb28f223
|
File details
Details for the file iocparser_tool-4.0.0-py3-none-any.whl.
File metadata
- Download URL: iocparser_tool-4.0.0-py3-none-any.whl
- Upload date:
- Size: 3.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2053a21a4f93d61cc69b6f73276a31f5a67c2dbd040faadc1de97a09267ffe57
|
|
| MD5 |
a1b823b40fb7c22bcb59edede3a44aef
|
|
| BLAKE2b-256 |
27bc2596b437442df0ac89b932f00ba75cede105352d689ba8f692f69333466a
|