Skip to main content

Web scraper to download and extract file metadata

Project description

PyMeta

     

PyMeta is a Python3 rewrite of the tool PowerMeta, created by dafthack in PowerShell. It uses specially crafted search queries to identify and download the following file types (pdf, xls, xlsx, csv, doc, docx, ppt, pptx) from a given domain using Google and Bing scraping.

Once downloaded, metadata is extracted from these files using Phil Harvey's exiftool and added to a .csv report. Alternatively, Pymeta can be pointed at a directory to extract metadata from files manually downloaded using the -dir command line argument. See the Usage, or All Options section for more information.

Why?

Metadata is a common place for penetration testers and red teamers to find: domains, user accounts, naming conventions, software/version numbers, and more!

Still not convinced? Checkout - Hacking Organizations One Document at a Time With Metadata

Getting Started

Prerequisites

Exiftool is required and can be installed with:

    Ubuntu/Kali - apt-get install exiftool -y

    Mac OS - brew install exiftool

Install:

git clone https://github.com/m8sec/pymeta
cd pymeta
python3 setup.py install

Usage

  • Search Google and Bing for files within example.com and extract metadata to a csv report:
    pymeta -d example.com

  • Extract metadata from files within the given directory and create csv report:
    pymeta -dir Downloads/

All Options

Target Options:
  -d DOMAIN             Target domain
  -dir FILE_DIR         Pre-existing directory of files

Search Options:
  -s {google,bing,all}  Search engine(s) to scrape (Default: all)
  -m MAX_RESULTS        Max results per file type, per search engine (Default: 50)
  -j JITTER             Seconds between search requests (Default: 2)

Output Options:
  -o OUTPUT_DIR         Path to store PyMeta's download folder (Default: ./)
  -f FILENAME           Custom report path/name.csv (Optional)
  --debug               Show links as they are collected during scraping

Credit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymetasec-1.1.1.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

pymetasec-1.1.1-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file pymetasec-1.1.1.tar.gz.

File metadata

  • Download URL: pymetasec-1.1.1.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for pymetasec-1.1.1.tar.gz
Algorithm Hash digest
SHA256 9695241899d86d9ee7a0a5a93590178b5170509b812d16b1c2f40974bd035b39
MD5 fd9a77cd7a0fb05af2272dcc54ed29dc
BLAKE2b-256 bdeae3627888cfef268c4e9626fc0a29601253a208b06cc583402c790043e90b

See more details on using hashes here.

File details

Details for the file pymetasec-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: pymetasec-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for pymetasec-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 37c6979f6d52e53f30bb7f650e7ea55c113b6280e13bd39fe90e1f706f6e8858
MD5 219a6d567a78f7af4db2419fa8d00410
BLAKE2b-256 099f0b645e8b8e1cad5cc9f502460e690c8ff4560217bbacb5df88d740dedb2b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page