Web scraper to download and extract file metadata
Project description
PyMeta
PyMeta is a Python3 rewrite of the tool PowerMeta, created by dafthack in PowerShell. It uses specially crafted search queries to identify and download the following file types (pdf, xls, xlsx, csv, doc, docx, ppt, pptx) from a given domain using Google and Bing scraping.
Once downloaded, metadata is extracted from these files using Phil Harvey's exiftool and added to a .csv
report. Alternatively, Pymeta can be pointed at a directory to extract metadata from files manually downloaded using the -dir
command line argument. See the Usage, or All Options section for more information.
Why?
Metadata is a common place for penetration testers and red teamers to find: domains, user accounts, naming conventions, software/version numbers, and more!
Still not convinced? Checkout - Hacking Organizations One Document at a Time With Metadata
Getting Started
Prerequisites
Exiftool is required and can be installed with:
Ubuntu/Kali - apt-get install exiftool -y
Mac OS - brew install exiftool
Install:
git clone https://github.com/m8sec/pymeta
cd pymeta
python3 setup.py install
Usage
-
Search Google and Bing for files within example.com and extract metadata to a csv report:
pymeta -d example.com
-
Extract metadata from files within the given directory and create csv report:
pymeta -dir Downloads/
All Options
Target Options:
-d DOMAIN Target domain
-dir FILE_DIR Pre-existing directory of files
Search Options:
-s {google,bing,all} Search engine(s) to scrape (Default: all)
-m MAX_RESULTS Max results per file type, per search engine (Default: 50)
-j JITTER Seconds between search requests (Default: 2)
Output Options:
-o OUTPUT_DIR Path to store PyMeta's download folder (Default: ./)
-f FILENAME Custom report path/name.csv (Optional)
--debug Show links as they are collected during scraping
Credit
- Beau Bullock (@dafthack) - https://github.com/dafthack/PowerMeta
- Phil Harvey - https://exiftool.org/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pymetasec-1.1.1.tar.gz
.
File metadata
- Download URL: pymetasec-1.1.1.tar.gz
- Upload date:
- Size: 18.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9695241899d86d9ee7a0a5a93590178b5170509b812d16b1c2f40974bd035b39 |
|
MD5 | fd9a77cd7a0fb05af2272dcc54ed29dc |
|
BLAKE2b-256 | bdeae3627888cfef268c4e9626fc0a29601253a208b06cc583402c790043e90b |
File details
Details for the file pymetasec-1.1.1-py3-none-any.whl
.
File metadata
- Download URL: pymetasec-1.1.1-py3-none-any.whl
- Upload date:
- Size: 19.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37c6979f6d52e53f30bb7f650e7ea55c113b6280e13bd39fe90e1f706f6e8858 |
|
MD5 | 219a6d567a78f7af4db2419fa8d00410 |
|
BLAKE2b-256 | 099f0b645e8b8e1cad5cc9f502460e690c8ff4560217bbacb5df88d740dedb2b |