Extract IOCs from URLs and files with safe defaults and low-confidence highlighting.
Project description
iocscrape
CTI tool to extract IOCs from CTI reports (URLs or files)
IOC extraction is best-effort and may produce false positives - always review before ingestion.
Links
Features
- Extract IOCs from:
- URLs (CTI articles / reports)
- Files:
txt,html,pdf,docx,xlsx
- Uses trafilatura to convert web pages into clean text (reduces noise from hidden links / menus / assets).
- Groups suspicious/noisy matches into Low-Confidence (Review) using:
- Public Suffix List (PSL) validation
- MISP warninglists (vendored snapshot + optional
--update) - filename-like domain detection (e.g.
something.png) - static asset URL detection (e.g.
.png,.css,.woff2)
- Output formats:
- Default: TXT (pixhash-like run log style)
- Optional: JSON
--updateupdates both: warninglists + PSL
Installation
Option 1: pipx (recommended)
python3 -m pip install --user pipx
python3 -m pipx ensurepath
pipx install iocscrape
Option 2:
pip install iocscrape
Usage
Extract from URL
iocscrape --url "https://example.com/report" --out output.txt
Extract from File
iocscrape --file "/path/report.pdf" --out output.txt
JSON Output
iocscrape --url "https://example.com/report" --out output.json --format json
Updating datasets (Warninglists + PSL)
By default, iocscrape ships with a vendored snapshot of:
- MISP warninglists, and
- Public Suffix List (PSL).
To update them:
iocscrape --update
To update + run extraction in one command:
iocscrape --update --url "https://example.com/report" --out output.txt
Cache location:
~/.cache/iocscrape
Supported IOC Types
- URL
- Domain
- IPv4
- IPv6
- MD5
- SHA1
- SHA256
- CVE
Output
1. TXT (Default)
The output file is a run log:
- Results section contains "high-confidence" IOCs
- Low-Condidence (Review) section contains items flagged by:
- Warninglists match
- PSL invalid suffix
- Filename-like "domain"
- Static asset URL
Example structure:
iocscrape Run Log
=================
[#] Target: ...
[#] Date: ...
[#] Time: ...
[#] User-Agent: ...
[#] Output File: ...
-------
Results
-------
[#] URL (..)
...
-----------------------
Low-Confidence (Review)
-----------------------
[#] DOMAIN (..)
value >> reason
2. JSON
Contains:
- Counts per IOC type
- IOC by type
- Low-confidence array with reasons
Notes on False Positives
This tool uses regex-based extraction. It can still pick up:
- File names that look like domains
- Configuration keys
- Benign public infrastructure (flagged via warninglists / PSL into low-confidence)
Always review the output before operational ingestion (SIEM/Blocklists/EDR/Firewall... etc.).
License
MIT License. See LICENSE.
Contributing
Issues/PRs are welcomed:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iocscrape-0.1.1.tar.gz.
File metadata
- Download URL: iocscrape-0.1.1.tar.gz
- Upload date:
- Size: 10.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ba763e34b722b9fb3a49952bef80ba844a65a5f73b7ae08b63cca4abaeee696
|
|
| MD5 |
f57c07e9b530126b2255af5f70a88fea
|
|
| BLAKE2b-256 |
631f39885698dbd1ee429ba43ba54dafac7c2a85a379cd7de7f3c5e4905b4928
|
File details
Details for the file iocscrape-0.1.1-py3-none-any.whl.
File metadata
- Download URL: iocscrape-0.1.1-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be7510347d464018c4e2c45f2f56f51436dc7cc50d764bf1142d58492acbd210
|
|
| MD5 |
d0ebdab9f1abc2365aa28e0d4ea8b754
|
|
| BLAKE2b-256 |
207cae559e9bdd46dba41ba65ad08df210f406c1e6d9d8d5d875e08e9ade6ec6
|