Skip to main content

Web scraper to download and extract file metadata

Project description

pymeta

  

Pymeta is a Python3 rewrite of the tool PowerMeta, created by dafthack in PowerShell. It uses specially crafted search queries to identify and download the following file types (pdf, xls, xlsx, doc, docx, ppt, pptx) from a given domain using Google and Bing. Once downloaded, metadata is extracted from these files using Phil Harvey's exiftool. This is a common place for penetration testers to find internal domain names, usernames, software/version numbers, and identify an organization's naming conventions.

Pymeta can also be pointed at a directory to extract metadata from files manually downloaded using the '-dir' command line argument. See the 'Usage', and 'All Options' sections for more information.

During metadata extraction, unique 'Author', 'Creator', and 'Producer' fields will be written to the terminal. However, more verbose output can be accomplished by generating a csv report with the '-csv' command line argument.

Install

  • PyPi (last release)
pip3 install pymetadata
  • GitHub (latest code)
git clone https://github.com/m8r0wn/pymeta
cd pymeta
python3 setup.py install

Usage

  • Search Google and Bing for files within example.com and extract metadata to terminal:
    pymeat -d example.com

  • Search Google only for files within example.com and extract metadata to a csv report:
    pymeta -d example.com -s google -csv

  • Extract metadata from files within the given directory and create csv report:
    pymeta -dir ../Downloads/ -csv

All Options

-h, --help      show help message and exit
-d DOMAIN       Target domain
-dir FILE_DIR   Directory of files to extract Metadata
-s ENGINE       Search engine to use: google, bing, all (Default: all)
-m MAX_RESULTS  Max results to collect per file type (Default: 50)
-csv            write all metadata to CSV (Default: display in terminal)

Credit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pymetadata, version 1.0.2
Filename, size File type Python version Upload date Hashes
Filename, size pymetadata-1.0.2.tar.gz (92.4 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page