Skip to main content

CTI tool to extract IOCs from CTI reports (URLs or files), and write them to an output file. Low-confidence items are grouped at the end.

Project description

iocscrape

PyPI version Python Version License Category

CTI tool to extract IOCs from CTI reports (URLs or files)
IOC extraction is best-effort and may produce false positives - always review before ingestion.

Links

Features

  • Extract IOCs from:
    • URLs (CTI articles / reports)
    • Files: txt, html, pdf, docx, xlsx
  • Uses trafilatura to convert web pages into clean text (reduces noise from hidden links / menus / assets).
  • Groups suspicious/noisy matches into Low-Confidence (Review) using:
    • Public Suffix List (PSL) validation
    • MISP warninglists (vendored snapshot + optional --update)
    • filename-like domain detection (e.g. something.png)
    • static asset URL detection (e.g. .png, .css, .woff2)
  • Output formats:
    • Default: TXT (pixhash-like run log style)
    • Optional: JSON
  • --update updates both: warninglists + PSL

Installation

Option 1: pipx (recommended)

python3 -m pip install --user pipx
python3 -m pipx ensurepath
pipx install iocscrape

Option 2:

pip install iocscrape

Usage

Extract from URL

iocscrape --url "https://example.com/report" --out output.txt

Extract from File

iocscrape --file "/path/report.pdf" --out output.txt

JSON Output

iocscrape --url "https://example.com/report" --out output.json --format json

Updating datasets (Warninglists + PSL)

By default, iocscrape ships with a vendored snapshot of:

  • MISP warninglists, and
  • Public Suffix List (PSL).

To update them:

iocscrape --update

To update + run extraction in one command:

iocscrape --update --url "https://example.com/report" --out output.txt

Cache location: ~/.cache/iocscrape

Supported IOC Types

  • URL
  • Domain
  • IPv4
  • IPv6
  • Email
  • MD5
  • SHA1
  • SHA256
  • CVE

Output

1. TXT (Default)

The output file is a run log:

  • Results section contains "high-confidence" IOCs
  • Low-Condidence (Review) section contains items flagged by:
    • Warninglists match
    • PSL invalid suffix
    • Filename-like "domain"
    • Static asset URL

Example structure:

iocscrape Run Log
=================

[#] Target:       ...
[#] Date:         ...
[#] Time:         ...
[#] User-Agent:   ...
[#] Output File:  ...

-------
Results
-------

[#] URL (..)
...

-----------------------
Low-Confidence (Review)
-----------------------

[#] DOMAIN (..)
value >> reason

2. JSON

Contains:

  • Counts per IOC type
  • IOC by type
  • Low-confidence array with reasons

Notes on False Positives

This tool uses regex-based extraction. It can still pick up:

  • File names that look like domains
  • Configuration keys
  • Benign public infrastructure (flagged via warninglists / PSL into low-confidence)

Always review the output before operational ingestion (SIEM/Blocklists/EDR/Firewall... etc.).

License

MIT License. See LICENSE.

Contributing

Issues/PRs are welcomed:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iocscrape-0.2.1.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iocscrape-0.2.1-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file iocscrape-0.2.1.tar.gz.

File metadata

  • Download URL: iocscrape-0.2.1.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for iocscrape-0.2.1.tar.gz
Algorithm Hash digest
SHA256 8772e31f8bd1f4ba54c5ecea1a4e4a67a985b19e4774a2e255fc872de7dd7a7f
MD5 a850fc5e6793606884ed62b7519c74f6
BLAKE2b-256 c5ba08a76af07de714d447e31298032290e7f0f039dcdb3fcdd1956f35855534

See more details on using hashes here.

File details

Details for the file iocscrape-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: iocscrape-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for iocscrape-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 633c5929cb5589302836131d720a735abd2e380ecb8f79fd28bd715187b027fa
MD5 b3f0f473ab319c80c6978d45cdaa49f5
BLAKE2b-256 87342a1211e5c5f12895d3fe5f2331539fcb0f52b1f52e8cb0b5b4b2aa6b2f26

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page