Scrape files for sensitive information, and generate an interactive HTML report.
Project description
File Scraper
Scrape files for sensitive information, and generate an interactive HTML report. Based on Rabin2.
Customize the tool to your liking!
Tested on Kali Linux v2023.4 (64-bit).
Made for educational purposes. I hope it will help!
Table of Contents
How to Install
Install Radare2
On Kali Linux, run:
apt-get -y install radare2
On Windows OS, download and unpack radareorg/radare2, then, add the bin
directory to Windows PATH
environment variable.
On macOS, run:
brew install radare2
Standard Install
pip3 install --upgrade file-scraper
Build and Install From the Source
git clone https://github.com/ivan-sincek/file-scraper && cd file-scraper
python3 -m pip install --upgrade build
python3 -m build
python3 -m pip install dist/file_scraper-3.0-py3-none-any.whl
Build the Template & Run
Prepare a template:
{
"authorization":{
"query":"[^\\w\\d\\n]+(?:basic|bearer)\\ .+",
"ignorecase":true,
"search":true
},
"variable":{
"query":"(?:access|account|admin|basic|bearer|card|conf|cred|customer|email|history|id|info|jwt|key|kyc|log|otp|pass|pin|priv|refresh|salt|secret|seed|setting|sign|token|transaction|transfer|user)[\\w\\d]*(?:\\\"\\ *\\:|\\ *\\=).+",
"ignorecase":true,
"search":true
},
"comment":{
"query":"[^\\w\\d\\n]+(?:bug|comment|fix|issue|note|problem|to(?:\\_|\\ |)do|work)[^\\w\\d\\n]+.+",
"ignorecase":true,
"search":true
},
"url":{
"query":"\\w+\\:\\/\\/[\\w\\-\\.\\@\\:\\/\\?\\=\\%\\&\\#]+",
"unique":true,
"collect":true
},
"ip":{
"query":"(?:\b25[0-5]|\b2[0-4][0-9]|\b[01]?[0-9][0-9]?)(?:\\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}",
"unique":true,
"collect":true
},
"base64":{
"query":"(?:[a-zA-Z0-9\\+\\/]{4})*(?:[a-zA-Z0-9\\+\\/]{4}|[a-zA-Z0-9\\+\\/]{3}\\=|[a-zA-Z0-9\\+\\/]{2}\\=\\=)",
"minimum":8,
"decode":"base64",
"unique":true,
"collect":true
},
"hex":{
"query":"(?:(?:0x|(?:\\\\)+x)[a-fA-F0-9]{2})+|[a-fA-F0-9]+",
"minimum":12,
"decode":"hex",
"unique":true,
"collect":true
},
"cert":{
"query":"-----BEGIN (?:CERTIFICATE|PRIVATE KEY)-----[\\s\\S]+?-----END (?:CERTIFICATE|PRIVATE KEY)-----",
"decode":"cert",
"unique":true,
"collect":true
}
}
Make sure your regular expressions return only one capturing group, e.g., [1, 2, 3, 4]
; and not a touple, e.g., [(1, 2), (3, 4)]
.
Make sure to properly escape regular expression specific symbols in your template file, e.g., make sure to escape dot .
as \\.
, and forward slash /
as \\/
, etc.
Name | Type | Required | Description |
---|---|---|---|
query | text | yes | Regular expression query. |
search | boolean | no | Highlight matches within output; otherwise, extract matches. |
ignorecase | boolean | no | Case-insensitive search. |
minimum | integer | no | Show only matches longer than int characters. |
maximum | integer | no | Show only matches lesser than int characters. |
decode | boolean | no | Decode matches. Available decodings: url , base64 hex , cert . |
unique | boolean | no | Filter out duplicates. |
collect | boolean | no | Collect all matches in one place. |
How I run the tool most of the time:
file-scraper -dir directory -o results.html -e default
Default (built-in) exclude file types are as following:
car, css, gif, jpeg, jpg, mp3, mp4, nib, ogg, otf, png, storyboard, strings, svg, ttf, webp, woff, woff2, xib
Usage
File Scraper v3.0 ( github.com/ivan-sincek/file-scraper )
Usage: file-scraper -dir directory -o out [-t template ] [-e excludes ] [-th threads]
Example: file-scraper -dir decoded -o results.html [-t template.json] [-e jpeg,jpg,png] [-th 10 ]
DESCRIPTION
Scrape files for sensitive information
DIRECTORY
Directory containing files, or a single file to scrape
-dir, --directory> = decoded | files | test.exe | etc.
TEMPLATE
Template file with extraction details, or a single RegEx to use
Default: built-in JSON template file
-t, --template = template.json | "secret\: [\w\d]+" | etc.
EXCLUDES
Exclude all files that end with the specified extension
Specify 'default' to load the built-in list
Use comma-separated values
-e, --excludes = mp3 | default,jpeg,jpg,png | etc.
INCLUDES
Include all files that end with the specified extension
Overrides excludes
Use comma-separated values
-i, --includes = java | json,xml,yaml | etc.
BEAUTIFY
Beautify [minified] JavaScript (.js) files
-b, --beautify
THREADS
Number of parallel threads to run
Default: 30
-th, --threads = 10 | etc.
OUT
Output HTML file
-o, --out = results.html | etc.
DEBUG
Debug output
-dbg, --debug
Images
Figure 1 - Interactive Report
Figure 2 - Certificates
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file file_scraper-3.0.tar.gz
.
File metadata
- Download URL: file_scraper-3.0.tar.gz
- Upload date:
- Size: 101.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e24041867db05002b94a3d8bc7125ad89031e0f07dcb8e16f53e5ffde408962e |
|
MD5 | 9ffc822a7f034b0dae2e713a302e49b7 |
|
BLAKE2b-256 | ef1da41da67a2b1bccd8ef239ea5f25166bed803bacb172c65d78e079e65e7e4 |
File details
Details for the file file_scraper-3.0-py3-none-any.whl
.
File metadata
- Download URL: file_scraper-3.0-py3-none-any.whl
- Upload date:
- Size: 99.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5840eb8a79f28529318ee52f946ddfab72788fc1d03f4effffc9ca31daa6c2f5 |
|
MD5 | e3041168f42b9d41cce0742f3cd4e638 |
|
BLAKE2b-256 | a4c306e4327f0d845947ba818c3632c923e7b1f6e2be95502281e72107d8e5c7 |