Skip to main content

Metadata extraction and web scraping for OSINT and pentesting.

Project description

Sponsored by IPRoyal

Supported by IPRoyal — Proxy services for OSINT and security research.


Contributors Forks Stargazers Issues PyPI Docker License

MetaDetective

MetaDetective

Metadata extraction and web scraping for OSINT and pentesting.


Table of Contents


About

MetaDetective is a single-file Python 3 tool for metadata extraction and web scraping, built for OSINT and pentesting workflows.

It has no Python dependencies beyond exiftool. One curl and you're operational.

What it extracts: authors, software versions, GPS coordinates, creation/modification dates, internal hostnames, serial numbers, hyperlinks, camera models - across documents, images, and email files.

What it does beyond extraction:

  • Direct web scraping of target sites (no search engine dependency, no IP blocks)
  • GPS reverse geocoding with OpenStreetMap, map link generation
  • Export to HTML, TXT, or JSON
  • Selective field extraction with --parse-only
  • Deduplication across multiple files

It was built as a replacement for Metagoofil, which dropped native metadata analysis and relied on Google search (rate limiting, CAPTCHAs, proxy overhead).

MetaDetective demo

MetaDetective scraping demo


Installation

Requirements: Python 3, exiftool.

# Debian / Ubuntu / Kali
sudo apt install libimage-exiftool-perl

# macOS
brew install exiftool

# Windows
winget install OliverBetz.ExifTool

Direct download (recommended for field use)

curl -O https://raw.githubusercontent.com/franckferman/MetaDetective/stable/src/MetaDetective/MetaDetective.py
python3 MetaDetective.py -h

pip

pip install MetaDetective
metadetective -h

git clone

git clone https://github.com/franckferman/MetaDetective.git
cd MetaDetective
python3 src/MetaDetective/MetaDetective.py -h

Docker

docker pull franckferman/metadetective
docker run --rm franckferman/metadetective -h

# Mount a local directory
docker run --rm -v $(pwd)/loot:/data franckferman/metadetective -d /data

Usage

File analysis

# Analyze a directory (deduplicated singular view by default)
python3 MetaDetective.py -d ./loot/

# Specific file types, filter noise
python3 MetaDetective.py -d ./loot/ -t pdf docx -i admin anonymous

# Per-file display
python3 MetaDetective.py -d ./loot/ --display all

# Formatted output (singular/default display)
python3 MetaDetective.py -d ./loot/ --format formatted

# Single file
python3 MetaDetective.py -f report.pdf

# Multiple files
python3 MetaDetective.py -f report.pdf photo.heic

Selective parsing

--parse-only limits extraction to specific fields. Useful to cut noise or target a specific data point.

# Extract only Author and Creator fields
python3 MetaDetective.py -d ./loot/ --parse-only Author Creator

# Extract GPS data only from iPhone photos
python3 MetaDetective.py -d ./photos/ -t heic heif --parse-only 'GPS Position' 'Map Link'

Export

# HTML report (default)
python3 MetaDetective.py -d ./loot/ -e

# TXT
python3 MetaDetective.py -d ./loot/ -e txt

# JSON - singular (deduplicated values per field)
python3 MetaDetective.py -d ./loot/ -e json

# JSON - per file
python3 MetaDetective.py -d ./loot/ --display all -e json

# Custom filename suffix and output directory
python3 MetaDetective.py -d ./loot/ -e json -c pentest-corp -o ~/results/

JSON singular output structure:

{
  "tool": "MetaDetective",
  "generated": "2026-03-21T...",
  "unique": {
    "Author": ["Alice Martin", "Bob Dupont"],
    "Creator Tool": ["Microsoft Word 16.0"]
  }
}

Pivot with jq:

jq '.unique.Author' MetaDetective_Export-*.json

Web scraping

# Scan target site, list files found
python3 MetaDetective.py --scraping --scan --url https://target.com/

# Filter by extension
python3 MetaDetective.py --scraping --scan --url https://target.com/ --extensions pdf docx xlsx

# Download files (depth 2, 8 threads)
python3 MetaDetective.py --scraping --url https://target.com/ \
  --download-dir ~/loot/ --extensions pdf docx --depth 2 --threads 8

# Control request rate (requests/sec)
python3 MetaDetective.py --scraping --url https://target.com/ \
  --download-dir ~/loot/ --rate 5

# Follow external links
python3 MetaDetective.py --scraping --url https://target.com/ \
  --download-dir ~/loot/ --follow-extern

Filtering and display options

Flag Description
-t pdf docx Restrict to file types
-i admin anonymous Ignore values matching pattern (regex supported)
--parse-only Author Creator Extract only specified fields
--display all Show metadata per file
--display singular Deduplicated view across all files (default)
--format formatted Decorated output
--format concise Compact output

Supported formats

Documents: PDF, DOCX, ODT, XLS, XLSX, PPTX, ODP, RTF, CSV, XML Images: JPEG, PNG, TIFF, BMP, GIF, SVG, PSD, HEIC, HEIF Email: EML, MSG, PST, OST Video: MP4, MOV


License

AGPL-3.0. See LICENSE.

MetaDetective is provided for educational and authorized security testing purposes. You are responsible for ensuring compliance with applicable laws.


Star History

Star History Chart

Contact

ProtonMail LinkedIn Twitter

Back to top

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metadetective-2.0.3.tar.gz (62.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

metadetective-2.0.3-py3-none-any.whl (44.8 kB view details)

Uploaded Python 3

File details

Details for the file metadetective-2.0.3.tar.gz.

File metadata

  • Download URL: metadetective-2.0.3.tar.gz
  • Upload date:
  • Size: 62.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for metadetective-2.0.3.tar.gz
Algorithm Hash digest
SHA256 0550f632da5bed87c6002f2eae8cb483fda9125930c3bdf258d0e80e28975322
MD5 79ece70c5e0c9cc5ebaf85891c9125f0
BLAKE2b-256 e9a0fc505d16c9c97a6767e3287273786764ba6373a810d8e76ce7be7c7d9505

See more details on using hashes here.

File details

Details for the file metadetective-2.0.3-py3-none-any.whl.

File metadata

  • Download URL: metadetective-2.0.3-py3-none-any.whl
  • Upload date:
  • Size: 44.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for metadetective-2.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 390bb3a28f6b100f9a937cd2b8e29b17ede3dacf5c924799c0caaf9ed4815356
MD5 824518ba25d8be20d6d23107cb315cfe
BLAKE2b-256 a8eaf47f35c77b2ab370a994cddd64fe6c93ad28e55264ea7b30b701ea0dafdc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page