Skip to main content

Web scraper to download and extract file metadata

Project description



Pymeta is a Python3 rewrite of the tool PowerMeta, created by dafthack in PowerShell. It uses specially crafted search queries to identify and download the following file types (pdf, xls, xlsx, csv, doc, docx, ppt, pptx) from a given domain using Google and Bing scraping. Once downloaded, metadata is extracted from these files using Phil Harvey's exiftool and added to a .csv report. Metadata is a common place for penetration testers to find internal domain names, usernames, software/version numbers, and help identify an organization's naming convention.

Pymeta can also be pointed at a directory to extract metadata from files manually downloaded using the -dir command line argument. See the Usage, or All Options section for more information.


  • PyPi (last release)
pip3 install pymetadata
  • GitHub (latest code)
git clone
cd pymeta
python3 install


  • Search Google and Bing for files within and extract metadata to a csv report: pymeta -d

  • Extract metadata from files within the given directory and create csv report: pymeta -dir Downloads/

All Options

Target Options:
  -d DOMAIN             Target domain
  -dir FILE_DIR         Pre-existing directory of files

Search Options:
  -s {google,bing,all}  Search engine(s) to scrape (Default: all)
  -m MAX_RESULTS        Max results per file type, per search engine (Default: 50)
  -j JITTER             Seconds between search requests (Default: 2)

Output Options:
  -o OUTPUT_DIR         Path to store PyMeta's download folder (Default: ./)
  -f FILENAME           Custom report path/name.csv (Optional)
  --debug               Show links as they are collected during scraping


Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pymetadata, version 1.0.4
Filename, size File type Python version Upload date Hashes
Filename, size pymetadata-1.0.4.tar.gz (92.7 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page