A tool for extracting Indicators of Compromise from security reports
Project description
IOCParser
Production-grade IOC extraction, enrichment, persistence, and pipeline tooling for threat intelligence workflows
Overview
IOCParser extracts Indicators of Compromise from reports, feeds, URLs, stdin, and directory trees. It supports refanging, MISP warning-list enrichment, structured renderers, persisted run history, IOC search, run diffs, and queue-backed distributed processing.
Key Features
| Feature | Description |
|---|---|
| Multi-source ingestion | Parse PDF, HTML, text, stdin, URLs, URL lists, multi-file batches, and directories |
| IOC extraction | Detect hashes, network indicators, Windows artifacts, threat-intel IDs, crypto addresses, YARA, and more |
| Warning-list enrichment | MISP warning-list matching with normal/warning separation and evidence context |
| Structured outputs | Render text, summary, JSON, JSONL, CSV, and STIX 2.1 |
| Persistence | Store runs in SQLite or MariaDB-compatible SQLAlchemy backends |
| Search and diff | Query persisted IOCs, export runs, diff runs, and compare against latest successful source runs |
| Batch operations | URL retries, backoff, rate limiting, concurrency, and failed-item replay reports |
| Distributed pipeline | Filesystem, RabbitMQ, SQS, and Celery queue adapters with persisted job lifecycle |
| Plugin surface | Custom renderers, enrichers, extractors, postprocessors, and IOC types |
Supported Outputs
Human reports text, summary
Data formats JSON, JSONL, CSV
Threat intel STIX 2.1 bundles
Persistence run exports, IOC search pages, structured run diffs
Operations URL batch reports, pipeline job results, schema artifacts
Supported IOC Families
Hashes MD5, SHA1, SHA256, SHA512, SSDEEP, IMPHASH
Network Domains, Hosts, IPv4, IPv6, URLs, Emails, ASNs
Windows Registry keys, mutexes, named pipes, service names
Artifacts Filenames, filepaths, certificate serials, JWT, user agents
Threat intel CVEs, MITRE ATT&CK techniques, YARA rules
Crypto Bitcoin, Ethereum, Monero
Other MAC addresses
Installation
From PyPI (Recommended)
pip install iocparser-tool
From Source
git clone https://github.com/seifreed/IOCParser.git
cd IOCParser
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -e .
Development Extras
pip install -e ".[dev]"
Quick Start
# Initialize warning lists once
iocparser --init
# Extract from file, URL, or stdin
iocparser -f report.pdf
iocparser -u https://example.com/report.html
cat report.txt | iocparser --stdin --json
# Persist and query later
iocparser -f report.txt --persist --db-uri "sqlite:///iocparser.db"
iocparser --list-runs --db-uri "sqlite:///iocparser.db"
Usage
Command Line Interface
# Single inputs
iocparser -f report.pdf
iocparser https://example.com/report.html
iocparser --stdin < report.txt
# Batch files and URL feeds
iocparser -m report1.txt report2.txt report3.txt
iocparser -d reports --recursive --glob "*.html"
iocparser --url-file feeds.txt --url-workers 8 --url-retries 2 --batch-report-json batch.json
# Output formats
iocparser -f report.txt --json
iocparser -f report.txt --jsonl
iocparser -f report.txt --csv
iocparser -f report.txt --stix --stix-types domains,urls,ips
iocparser -f report.txt --summary
# Analyst filters
iocparser -f report.txt --only urls,domains --severity medium --with-context
iocparser -f report.txt --exclude yara,registry --only-normal
iocparser -f report.txt --sort-by severity --max-evidence 1
Available Options (Main Workflows)
| Workflow | Description |
|---|---|
-f, --file |
Parse a single file or - for stdin |
-u, --url |
Download and parse one URL |
--stdin |
Read IOC text from stdin |
-m, --multiple |
Parse multiple files and merge results |
-d, --directory |
Parse files from a directory, with --recursive and --glob |
--url-file |
Parse a URL feed with workers, retries, backoff, and rate limiting |
--streaming |
Process large files in chunks |
--persist |
Save extraction run metadata and IOCs to a database |
--list-runs |
List persisted runs |
--search-ioc |
Search persisted IOC values with auto, fts, or like backends |
--export-run |
Export a persisted run as text, JSON, JSONL, CSV, or STIX |
--diff-runs |
Compare two persisted runs |
--diff-latest |
Compare a run with the latest successful run from the same source |
--retry-failed-from |
Replay failed URL items from a previous batch report |
--schema-version, --migrate |
Inspect or migrate the persistence schema |
Persistence Examples
# Search persisted IOCs
iocparser --search-ioc evil.example --db-uri "sqlite:///iocparser.db"
iocparser --search-ioc evil.example --ioc-type urls --severity informational --tag warning-list-match
# Export and diff runs
iocparser --export-run 42 --json --db-uri "sqlite:///iocparser.db"
iocparser --diff-runs 40 42 --diff-only added --json --db-uri "sqlite:///iocparser.db"
iocparser --diff-latest 42 --summary --db-uri "sqlite:///iocparser.db"
# Maintenance
iocparser --delete-run 42 --db-uri "sqlite:///iocparser.db"
iocparser --prune-before 2026-01-01T00:00:00 --keep-latest 10 --db-uri "sqlite:///iocparser.db"
HTTP and Batch Flags
| Option | Description |
|---|---|
--url-workers |
Number of concurrent URL workers |
--url-retries |
Per-URL retry attempts |
--url-backoff |
Backoff between URL retries |
--rate-limit |
Delay between URL fetches |
--user-agent |
Custom HTTP user agent |
--header, --cookie, --proxy |
HTTP request customization |
--allow-redirects, --tls-verify, --tls-cert, --ca-bundle |
Redirect and TLS policy |
--connect-timeout, --read-timeout |
HTTP timeout policy |
Python Library
Extraction API
from iocparser import extraction
normal_iocs, warning_iocs = extraction.extract_iocs_from_file("report.pdf")
normal_iocs, warning_iocs = extraction.extract_iocs_from_text("evil.example 198.51.100.10")
normal_iocs, warning_iocs = extraction.extract_iocs_from_url(
"https://example.com/report.html",
only="urls,domains",
exclude="registry",
)
result = extraction.extract_result_from_file("report.pdf")
print(result.total_count())
Persistence API
from iocparser import persistence
db_uri = "sqlite:///iocparser.db"
runs = persistence.list_persisted_runs(db_uri=db_uri, limit=10)
hits = persistence.search_persisted_iocs(
db_uri=db_uri,
value="evil.example",
ioc_type="urls",
min_severity="medium",
tag="network",
)
exported = persistence.export_persisted_run(db_uri=db_uri, run_id=42)
diff = persistence.diff_persisted_runs(db_uri=db_uri, left_run_id=40, right_run_id=42)
Distributed Pipeline API
from iocparser import pipeline
client = pipeline.DistributedPipelineClient(
db_uri="sqlite:///iocparser.db",
queue_backend="filesystem",
queue_path=".iocparser-queue",
)
job = client.submit(
pipeline.PipelineJobRequest(
input_kind="text",
source_value="IOC hxxp://evil.example",
persist=True,
db_uri="sqlite:///iocparser.db",
check_warnings=False,
),
queue_name="ingest",
)
client.process_next(queue_name="ingest")
state = client.get_job(job_id=job.job_id)
Lower-level Components
from iocparser.infrastructure.extraction import IOCExtractor
from iocparser.infrastructure.file_parser import PDFParser
from iocparser.infrastructure.warninglists import MISPWarningLists
text = PDFParser("report.pdf").extract_text()
raw_iocs = IOCExtractor(defang=True).extract_all(text)
warning_lists = MISPWarningLists()
Configuration
IOCParser resolves configuration in this order:
- CLI arguments
- Environment variables
- INI file
export IOCPARSER_PERSIST=1
export IOCPARSER_DB_URI="sqlite:///iocparser.db"
[database]
persist = true
uri = sqlite:///iocparser.db
[defaults]
only = urls,domains
exclude = yara
output_format = json
with_context = true
severity = medium,high
[network]
url_workers = 8
url_retries = 2
url_backoff = 0.25
rate_limit = 0.10
Included deployment profiles:
- deploy/iocparser.local.example.ini
- deploy/iocparser.scale.example.ini
- deploy/iocparser.production.example.ini
Pipeline and Schemas
IOCParser exposes versioned machine-readable contracts for batch reporting and queue-backed processing.
| Document | Scope |
|---|---|
| docs/PIPELINE_CONTRACT.md | Worker input/result contracts and resource limits |
| docs/DISTRIBUTED_PIPELINE.md | Queue-backed execution with filesystem, RabbitMQ, SQS, and Celery |
| docs/WORKER_DEPLOYMENT.md | Worker deployment guidance |
| docs/SCHEMA_ARTIFACTS.md | JSON schema artifacts and release publication |
| docs/SECURITY_OPERATIONS.md | Secret handling and operational guidance |
Standalone worker:
IOCPARSER_WORKER_QUEUE_BACKEND=filesystem \
IOCPARSER_WORKER_QUEUE_PATH=.iocparser-queue \
IOCPARSER_WORKER_QUEUE_NAME=ingest \
IOCPARSER_WORKER_DB_URI=sqlite:///iocparser.db \
iocparser-worker
Requirements
- Python 3.13 or 3.14
- libmagic runtime support for file type detection
- See pyproject.toml for dependencies and optional pipeline extras
Support the Project
If this project is useful in your workflows, you can support development:
License
This project is licensed under the MIT license. See LICENSE.
Attribution
- Author: Marc Rivero López | @seifreed
- Repository: github.com/seifreed/IOCParser
Built for practical IOC extraction, threat-intelligence automation, and security operations
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iocparser_tool-6.0.0.tar.gz.
File metadata
- Download URL: iocparser_tool-6.0.0.tar.gz
- Upload date:
- Size: 374.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
151cecebcec18e7d9c8062189f4d4dca7ebbc7c26a4c165cf56d3e3af2f33dc5
|
|
| MD5 |
b2762d604ae58f833b7c297e18b3ca7a
|
|
| BLAKE2b-256 |
b7e2acf514d1026ab154f1d50fbda9cd91d3a9eaefc7560818d6780d260eedd5
|
Provenance
The following attestation bundles were made for iocparser_tool-6.0.0.tar.gz:
Publisher:
publish.yml on seifreed/IOCParser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
iocparser_tool-6.0.0.tar.gz -
Subject digest:
151cecebcec18e7d9c8062189f4d4dca7ebbc7c26a4c165cf56d3e3af2f33dc5 - Sigstore transparency entry: 1523278980
- Sigstore integration time:
-
Permalink:
seifreed/IOCParser@5694481f0a6d9a51d80b79fd10dc5e9426c70117 -
Branch / Tag:
refs/tags/v6.0.0 - Owner: https://github.com/seifreed
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5694481f0a6d9a51d80b79fd10dc5e9426c70117 -
Trigger Event:
push
-
Statement type:
File details
Details for the file iocparser_tool-6.0.0-py3-none-any.whl.
File metadata
- Download URL: iocparser_tool-6.0.0-py3-none-any.whl
- Upload date:
- Size: 235.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34649df0d2a8edcf22aa078e7c8be3063814d92999ae101fb308a17b70df15a6
|
|
| MD5 |
f1055191f1462071397865af79173e93
|
|
| BLAKE2b-256 |
ebb9936d177426287a80a8c2b6f85f26531e7570347514239d87f978036dfcae
|
Provenance
The following attestation bundles were made for iocparser_tool-6.0.0-py3-none-any.whl:
Publisher:
publish.yml on seifreed/IOCParser
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
iocparser_tool-6.0.0-py3-none-any.whl -
Subject digest:
34649df0d2a8edcf22aa078e7c8be3063814d92999ae101fb308a17b70df15a6 - Sigstore transparency entry: 1523278992
- Sigstore integration time:
-
Permalink:
seifreed/IOCParser@5694481f0a6d9a51d80b79fd10dc5e9426c70117 -
Branch / Tag:
refs/tags/v6.0.0 - Owner: https://github.com/seifreed
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5694481f0a6d9a51d80b79fd10dc5e9426c70117 -
Trigger Event:
push
-
Statement type: