Skip to main content

A tool for extracting Indicators of Compromise from security reports

Project description

IOCParser

IOCParser

Production-grade IOC extraction, enrichment, persistence, and pipeline tooling for threat intelligence workflows

PyPI Version Python Versions License CI Status Codecov

GitHub Stars GitHub Issues Buy Me a Coffee


Overview

IOCParser extracts Indicators of Compromise from reports, feeds, URLs, stdin, and directory trees. It supports refanging, MISP warning-list enrichment, structured renderers, persisted run history, IOC search, run diffs, and queue-backed distributed processing.

Key Features

Feature Description
Multi-source ingestion Parse PDF, HTML, text, stdin, URLs, URL lists, multi-file batches, and directories
IOC extraction Detect hashes, network indicators, Windows artifacts, threat-intel IDs, crypto addresses, YARA, and more
Warning-list enrichment MISP warning-list matching with normal/warning separation and evidence context
Structured outputs Render text, summary, JSON, JSONL, CSV, and STIX 2.1
Persistence Store runs in SQLite or MariaDB-compatible SQLAlchemy backends
Search and diff Query persisted IOCs, export runs, diff runs, and compare against latest successful source runs
Batch operations URL retries, backoff, rate limiting, concurrency, and failed-item replay reports
Distributed pipeline Filesystem, RabbitMQ, SQS, and Celery queue adapters with persisted job lifecycle
Plugin surface Custom renderers, enrichers, extractors, postprocessors, and IOC types

Supported Outputs

Human reports    text, summary
Data formats     JSON, JSONL, CSV
Threat intel     STIX 2.1 bundles
Persistence      run exports, IOC search pages, structured run diffs
Operations       URL batch reports, pipeline job results, schema artifacts

Supported IOC Families

Hashes          MD5, SHA1, SHA256, SHA512, SSDEEP, IMPHASH
Network         Domains, Hosts, IPv4, IPv6, URLs, Emails, ASNs
Windows         Registry keys, mutexes, named pipes, service names
Artifacts       Filenames, filepaths, certificate serials, JWT, user agents
Threat intel    CVEs, MITRE ATT&CK techniques, YARA rules
Crypto          Bitcoin, Ethereum, Monero
Other           MAC addresses

Installation

From PyPI (Recommended)

pip install iocparser-tool

From Source

git clone https://github.com/seifreed/IOCParser.git
cd IOCParser
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -e .

Development Extras

pip install -e ".[dev]"

Quick Start

# Initialize warning lists once
iocparser --init

# Extract from file, URL, or stdin
iocparser -f report.pdf
iocparser -u https://example.com/report.html
cat report.txt | iocparser --stdin --json

# Persist and query later
iocparser -f report.txt --persist --db-uri "sqlite:///iocparser.db"
iocparser --list-runs --db-uri "sqlite:///iocparser.db"

Usage

Command Line Interface

# Single inputs
iocparser -f report.pdf
iocparser https://example.com/report.html
iocparser --stdin < report.txt

# Batch files and URL feeds
iocparser -m report1.txt report2.txt report3.txt
iocparser -d reports --recursive --glob "*.html"
iocparser --url-file feeds.txt --url-workers 8 --url-retries 2 --batch-report-json batch.json

# Output formats
iocparser -f report.txt --json
iocparser -f report.txt --jsonl
iocparser -f report.txt --csv
iocparser -f report.txt --stix --stix-types domains,urls,ips
iocparser -f report.txt --summary

# Analyst filters
iocparser -f report.txt --only urls,domains --severity medium --with-context
iocparser -f report.txt --exclude yara,registry --only-normal
iocparser -f report.txt --sort-by severity --max-evidence 1

Available Options (Main Workflows)

Workflow Description
-f, --file Parse a single file or - for stdin
-u, --url Download and parse one URL
--stdin Read IOC text from stdin
-m, --multiple Parse multiple files and merge results
-d, --directory Parse files from a directory, with --recursive and --glob
--url-file Parse a URL feed with workers, retries, backoff, and rate limiting
--streaming Process large files in chunks
--persist Save extraction run metadata and IOCs to a database
--list-runs List persisted runs
--search-ioc Search persisted IOC values with auto, fts, or like backends
--export-run Export a persisted run as text, JSON, JSONL, CSV, or STIX
--diff-runs Compare two persisted runs
--diff-latest Compare a run with the latest successful run from the same source
--retry-failed-from Replay failed URL items from a previous batch report
--schema-version, --migrate Inspect or migrate the persistence schema

Persistence Examples

# Search persisted IOCs
iocparser --search-ioc evil.example --db-uri "sqlite:///iocparser.db"
iocparser --search-ioc evil.example --ioc-type urls --severity informational --tag warning-list-match

# Export and diff runs
iocparser --export-run 42 --json --db-uri "sqlite:///iocparser.db"
iocparser --diff-runs 40 42 --diff-only added --json --db-uri "sqlite:///iocparser.db"
iocparser --diff-latest 42 --summary --db-uri "sqlite:///iocparser.db"

# Maintenance
iocparser --delete-run 42 --db-uri "sqlite:///iocparser.db"
iocparser --prune-before 2026-01-01T00:00:00 --keep-latest 10 --db-uri "sqlite:///iocparser.db"

HTTP and Batch Flags

Option Description
--url-workers Number of concurrent URL workers
--url-retries Per-URL retry attempts
--url-backoff Backoff between URL retries
--rate-limit Delay between URL fetches
--user-agent Custom HTTP user agent
--header, --cookie, --proxy HTTP request customization
--allow-redirects, --tls-verify, --tls-cert, --ca-bundle Redirect and TLS policy
--connect-timeout, --read-timeout HTTP timeout policy

Python Library

Extraction API

from iocparser import extraction

normal_iocs, warning_iocs = extraction.extract_iocs_from_file("report.pdf")
normal_iocs, warning_iocs = extraction.extract_iocs_from_text("evil.example 198.51.100.10")
normal_iocs, warning_iocs = extraction.extract_iocs_from_url(
    "https://example.com/report.html",
    only="urls,domains",
    exclude="registry",
)

result = extraction.extract_result_from_file("report.pdf")
print(result.total_count())

Persistence API

from iocparser import persistence

db_uri = "sqlite:///iocparser.db"

runs = persistence.list_persisted_runs(db_uri=db_uri, limit=10)
hits = persistence.search_persisted_iocs(
    db_uri=db_uri,
    value="evil.example",
    ioc_type="urls",
    min_severity="medium",
    tag="network",
)
exported = persistence.export_persisted_run(db_uri=db_uri, run_id=42)
diff = persistence.diff_persisted_runs(db_uri=db_uri, left_run_id=40, right_run_id=42)

Distributed Pipeline API

from iocparser import pipeline

client = pipeline.DistributedPipelineClient(
    db_uri="sqlite:///iocparser.db",
    queue_backend="filesystem",
    queue_path=".iocparser-queue",
)

job = client.submit(
    pipeline.PipelineJobRequest(
        input_kind="text",
        source_value="IOC hxxp://evil.example",
        persist=True,
        db_uri="sqlite:///iocparser.db",
        check_warnings=False,
    ),
    queue_name="ingest",
)

client.process_next(queue_name="ingest")
state = client.get_job(job_id=job.job_id)

Lower-level Components

from iocparser.infrastructure.extraction import IOCExtractor
from iocparser.infrastructure.file_parser import PDFParser
from iocparser.infrastructure.warninglists import MISPWarningLists

text = PDFParser("report.pdf").extract_text()
raw_iocs = IOCExtractor(defang=True).extract_all(text)
warning_lists = MISPWarningLists()

Configuration

IOCParser resolves configuration in this order:

  1. CLI arguments
  2. Environment variables
  3. INI file
export IOCPARSER_PERSIST=1
export IOCPARSER_DB_URI="sqlite:///iocparser.db"
[database]
persist = true
uri = sqlite:///iocparser.db

[defaults]
only = urls,domains
exclude = yara
output_format = json
with_context = true
severity = medium,high

[network]
url_workers = 8
url_retries = 2
url_backoff = 0.25
rate_limit = 0.10

Included deployment profiles:


Pipeline and Schemas

IOCParser exposes versioned machine-readable contracts for batch reporting and queue-backed processing.

Document Scope
docs/PIPELINE_CONTRACT.md Worker input/result contracts and resource limits
docs/DISTRIBUTED_PIPELINE.md Queue-backed execution with filesystem, RabbitMQ, SQS, and Celery
docs/WORKER_DEPLOYMENT.md Worker deployment guidance
docs/SCHEMA_ARTIFACTS.md JSON schema artifacts and release publication
docs/SECURITY_OPERATIONS.md Secret handling and operational guidance

Standalone worker:

IOCPARSER_WORKER_QUEUE_BACKEND=filesystem \
IOCPARSER_WORKER_QUEUE_PATH=.iocparser-queue \
IOCPARSER_WORKER_QUEUE_NAME=ingest \
IOCPARSER_WORKER_DB_URI=sqlite:///iocparser.db \
iocparser-worker

Requirements

  • Python 3.13 or 3.14
  • libmagic runtime support for file type detection
  • See pyproject.toml for dependencies and optional pipeline extras

Support the Project

If this project is useful in your workflows, you can support development:

Buy Me A Coffee

License

This project is licensed under the MIT license. See LICENSE.

Attribution


Built for practical IOC extraction, threat-intelligence automation, and security operations

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iocparser_tool-7.0.0.tar.gz (522.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iocparser_tool-7.0.0-py3-none-any.whl (300.6 kB view details)

Uploaded Python 3

File details

Details for the file iocparser_tool-7.0.0.tar.gz.

File metadata

  • Download URL: iocparser_tool-7.0.0.tar.gz
  • Upload date:
  • Size: 522.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for iocparser_tool-7.0.0.tar.gz
Algorithm Hash digest
SHA256 df45831a576108243b9d5bf689bc932dc6906a988c8ac272c1279b94a0277f53
MD5 d7943a924ae3976b60c33f26a8bb4fb6
BLAKE2b-256 0ed71d0ce25881d1bf52a775bfedd6ca5789bb21761bd1db7ff23effacba955e

See more details on using hashes here.

Provenance

The following attestation bundles were made for iocparser_tool-7.0.0.tar.gz:

Publisher: publish.yml on seifreed/IOCParser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file iocparser_tool-7.0.0-py3-none-any.whl.

File metadata

  • Download URL: iocparser_tool-7.0.0-py3-none-any.whl
  • Upload date:
  • Size: 300.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for iocparser_tool-7.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7a223896fbbb0aa58c3f35d969c34d8b69f5290fb47b28a4619be9e166960c22
MD5 892ad40666bc9848f03456cd1b82daef
BLAKE2b-256 9307800f3402638ccb951927337b1d66e9a8b5493d08a8383e6b782a4f7a43d3

See more details on using hashes here.

Provenance

The following attestation bundles were made for iocparser_tool-7.0.0-py3-none-any.whl:

Publisher: publish.yml on seifreed/IOCParser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page