Skip to main content

A tool for extracting Indicators of Compromise from security reports

Project description

IOCParser

IOCParser

Production-grade IOC extraction, enrichment, persistence, and pipeline tooling for threat intelligence workflows

PyPI Version Python Versions License CI Status Codecov

GitHub Stars GitHub Issues Buy Me a Coffee


Overview

IOCParser extracts Indicators of Compromise from reports, feeds, URLs, stdin, and directory trees. It supports refanging, MISP warning-list enrichment, structured renderers, persisted run history, IOC search, run diffs, and queue-backed distributed processing.

Key Features

Feature Description
Multi-source ingestion Parse PDF, HTML, text, stdin, URLs, URL lists, multi-file batches, and directories
IOC extraction Detect hashes, network indicators, Windows artifacts, threat-intel IDs, crypto addresses, YARA, and more
Warning-list enrichment MISP warning-list matching with normal/warning separation and evidence context
Structured outputs Render text, summary, JSON, JSONL, CSV, and STIX 2.1
Persistence Store runs in SQLite or MariaDB-compatible SQLAlchemy backends
Search and diff Query persisted IOCs, export runs, diff runs, and compare against latest successful source runs
Batch operations URL retries, backoff, rate limiting, concurrency, and failed-item replay reports
Distributed pipeline Filesystem, RabbitMQ, SQS, and Celery queue adapters with persisted job lifecycle
Plugin surface Custom renderers, enrichers, extractors, postprocessors, and IOC types

Supported Outputs

Human reports    text, summary
Data formats     JSON, JSONL, CSV
Threat intel     STIX 2.1 bundles
Persistence      run exports, IOC search pages, structured run diffs
Operations       URL batch reports, pipeline job results, schema artifacts

Supported IOC Families

Hashes          MD5, SHA1, SHA256, SHA512, SSDEEP, IMPHASH
Network         Domains, Hosts, IPv4, IPv6, URLs, Emails, ASNs
Windows         Registry keys, mutexes, named pipes, service names
Artifacts       Filenames, filepaths, certificate serials, JWT, user agents
Threat intel    CVEs, MITRE ATT&CK techniques, YARA rules
Crypto          Bitcoin, Ethereum, Monero
Other           MAC addresses

Installation

From PyPI (Recommended)

pip install iocparser-tool

From Source

git clone https://github.com/seifreed/IOCParser.git
cd IOCParser
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -e .

Development Extras

pip install -e ".[dev]"

Quick Start

# Initialize warning lists once
iocparser --init

# Extract from file, URL, or stdin
iocparser -f report.pdf
iocparser -u https://example.com/report.html
cat report.txt | iocparser --stdin --json

# Persist and query later
iocparser -f report.txt --persist --db-uri "sqlite:///iocparser.db"
iocparser --list-runs --db-uri "sqlite:///iocparser.db"

Usage

Command Line Interface

# Single inputs
iocparser -f report.pdf
iocparser https://example.com/report.html
iocparser --stdin < report.txt

# Batch files and URL feeds
iocparser -m report1.txt report2.txt report3.txt
iocparser -d reports --recursive --glob "*.html"
iocparser --url-file feeds.txt --url-workers 8 --url-retries 2 --batch-report-json batch.json

# Output formats
iocparser -f report.txt --json
iocparser -f report.txt --jsonl
iocparser -f report.txt --csv
iocparser -f report.txt --stix --stix-types domains,urls,ips
iocparser -f report.txt --summary

# Analyst filters
iocparser -f report.txt --only urls,domains --severity medium --with-context
iocparser -f report.txt --exclude yara,registry --only-normal
iocparser -f report.txt --sort-by severity --max-evidence 1

Available Options (Main Workflows)

Workflow Description
-f, --file Parse a single file or - for stdin
-u, --url Download and parse one URL
--stdin Read IOC text from stdin
-m, --multiple Parse multiple files and merge results
-d, --directory Parse files from a directory, with --recursive and --glob
--url-file Parse a URL feed with workers, retries, backoff, and rate limiting
--streaming Process large files in chunks
--persist Save extraction run metadata and IOCs to a database
--list-runs List persisted runs
--search-ioc Search persisted IOC values with auto, fts, or like backends
--export-run Export a persisted run as text, JSON, JSONL, CSV, or STIX
--diff-runs Compare two persisted runs
--diff-latest Compare a run with the latest successful run from the same source
--retry-failed-from Replay failed URL items from a previous batch report
--schema-version, --migrate Inspect or migrate the persistence schema

Persistence Examples

# Search persisted IOCs
iocparser --search-ioc evil.example --db-uri "sqlite:///iocparser.db"
iocparser --search-ioc evil.example --ioc-type urls --severity informational --tag warning-list-match

# Export and diff runs
iocparser --export-run 42 --json --db-uri "sqlite:///iocparser.db"
iocparser --diff-runs 40 42 --diff-only added --json --db-uri "sqlite:///iocparser.db"
iocparser --diff-latest 42 --summary --db-uri "sqlite:///iocparser.db"

# Maintenance
iocparser --delete-run 42 --db-uri "sqlite:///iocparser.db"
iocparser --prune-before 2026-01-01T00:00:00 --keep-latest 10 --db-uri "sqlite:///iocparser.db"

HTTP and Batch Flags

Option Description
--url-workers Number of concurrent URL workers
--url-retries Per-URL retry attempts
--url-backoff Backoff between URL retries
--rate-limit Delay between URL fetches
--user-agent Custom HTTP user agent
--header, --cookie, --proxy HTTP request customization
--allow-redirects, --tls-verify, --tls-cert, --ca-bundle Redirect and TLS policy
--connect-timeout, --read-timeout HTTP timeout policy

Python Library

Extraction API

from iocparser import extraction

normal_iocs, warning_iocs = extraction.extract_iocs_from_file("report.pdf")
normal_iocs, warning_iocs = extraction.extract_iocs_from_text("evil.example 198.51.100.10")
normal_iocs, warning_iocs = extraction.extract_iocs_from_url(
    "https://example.com/report.html",
    only="urls,domains",
    exclude="registry",
)

result = extraction.extract_result_from_file("report.pdf")
print(result.total_count())

Persistence API

from iocparser import persistence

db_uri = "sqlite:///iocparser.db"

runs = persistence.list_persisted_runs(db_uri=db_uri, limit=10)
hits = persistence.search_persisted_iocs(
    db_uri=db_uri,
    value="evil.example",
    ioc_type="urls",
    min_severity="medium",
    tag="network",
)
exported = persistence.export_persisted_run(db_uri=db_uri, run_id=42)
diff = persistence.diff_persisted_runs(db_uri=db_uri, left_run_id=40, right_run_id=42)

Distributed Pipeline API

from iocparser import pipeline

client = pipeline.DistributedPipelineClient(
    db_uri="sqlite:///iocparser.db",
    queue_backend="filesystem",
    queue_path=".iocparser-queue",
)

job = client.submit(
    pipeline.PipelineJobRequest(
        input_kind="text",
        source_value="IOC hxxp://evil.example",
        persist=True,
        db_uri="sqlite:///iocparser.db",
        check_warnings=False,
    ),
    queue_name="ingest",
)

client.process_next(queue_name="ingest")
state = client.get_job(job_id=job.job_id)

Lower-level Components

from iocparser.infrastructure.extraction import IOCExtractor
from iocparser.infrastructure.file_parser import PDFParser
from iocparser.infrastructure.warninglists import MISPWarningLists

text = PDFParser("report.pdf").extract_text()
raw_iocs = IOCExtractor(defang=True).extract_all(text)
warning_lists = MISPWarningLists()

Configuration

IOCParser resolves configuration in this order:

  1. CLI arguments
  2. Environment variables
  3. INI file
export IOCPARSER_PERSIST=1
export IOCPARSER_DB_URI="sqlite:///iocparser.db"
[database]
persist = true
uri = sqlite:///iocparser.db

[defaults]
only = urls,domains
exclude = yara
output_format = json
with_context = true
severity = medium,high

[network]
url_workers = 8
url_retries = 2
url_backoff = 0.25
rate_limit = 0.10

Included deployment profiles:


Pipeline and Schemas

IOCParser exposes versioned machine-readable contracts for batch reporting and queue-backed processing.

Document Scope
docs/PIPELINE_CONTRACT.md Worker input/result contracts and resource limits
docs/DISTRIBUTED_PIPELINE.md Queue-backed execution with filesystem, RabbitMQ, SQS, and Celery
docs/WORKER_DEPLOYMENT.md Worker deployment guidance
docs/SCHEMA_ARTIFACTS.md JSON schema artifacts and release publication
docs/SECURITY_OPERATIONS.md Secret handling and operational guidance

Standalone worker:

IOCPARSER_WORKER_QUEUE_BACKEND=filesystem \
IOCPARSER_WORKER_QUEUE_PATH=.iocparser-queue \
IOCPARSER_WORKER_QUEUE_NAME=ingest \
IOCPARSER_WORKER_DB_URI=sqlite:///iocparser.db \
iocparser-worker

Requirements

  • Python 3.13 or 3.14
  • libmagic runtime support for file type detection
  • See pyproject.toml for dependencies and optional pipeline extras

Support the Project

If this project is useful in your workflows, you can support development:

Buy Me A Coffee

License

This project is licensed under the MIT license. See LICENSE.

Attribution


Built for practical IOC extraction, threat-intelligence automation, and security operations

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iocparser_tool-6.0.0.tar.gz (374.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iocparser_tool-6.0.0-py3-none-any.whl (235.8 kB view details)

Uploaded Python 3

File details

Details for the file iocparser_tool-6.0.0.tar.gz.

File metadata

  • Download URL: iocparser_tool-6.0.0.tar.gz
  • Upload date:
  • Size: 374.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for iocparser_tool-6.0.0.tar.gz
Algorithm Hash digest
SHA256 151cecebcec18e7d9c8062189f4d4dca7ebbc7c26a4c165cf56d3e3af2f33dc5
MD5 b2762d604ae58f833b7c297e18b3ca7a
BLAKE2b-256 b7e2acf514d1026ab154f1d50fbda9cd91d3a9eaefc7560818d6780d260eedd5

See more details on using hashes here.

Provenance

The following attestation bundles were made for iocparser_tool-6.0.0.tar.gz:

Publisher: publish.yml on seifreed/IOCParser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file iocparser_tool-6.0.0-py3-none-any.whl.

File metadata

  • Download URL: iocparser_tool-6.0.0-py3-none-any.whl
  • Upload date:
  • Size: 235.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for iocparser_tool-6.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 34649df0d2a8edcf22aa078e7c8be3063814d92999ae101fb308a17b70df15a6
MD5 f1055191f1462071397865af79173e93
BLAKE2b-256 ebb9936d177426287a80a8c2b6f85f26531e7570347514239d87f978036dfcae

See more details on using hashes here.

Provenance

The following attestation bundles were made for iocparser_tool-6.0.0-py3-none-any.whl:

Publisher: publish.yml on seifreed/IOCParser

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page