SCuWl, Simple custom wordlist generator.
Project description
💀 Scuwl 💀
Simple custom wordlist generator
Scuwl (skull) is a Python CLI program that quickly and easily generates a wordlist from a webpage. The idea for Scuwl was inspired by the program Cewl. Scuwl defaults to a crawling depth of zero and most webpages return a wordlist in less than a second. Using a crawling depth of one generally takes a few minutes.
Scuwl is fast because it recursively scrapes websites asynchronously. Scuwl minimizes its memory footprint by processing HTML as it goes and updating the wordlist in memory as a set. By default Scuwl keeps unique words, three characters long and over, and removes all punctuation.
Note: Using a crawling depth of over one remains untested.
Features
- Fast recursive asynchronous web requests using aiohttp
- CLI options gives you control over the generated wordlist
- Simple Python codebase (< 175 lines)
- Low memory usage (~100MB)
Installation
$ python -m pip install scuwl
Usage
$ scuwl -h
usage: scuwl.py [-h] [-a] [-d DEPTH] [-H HEADERS] [-m MIN_LENGTH]
[-M MAX_LENGTH] [-o OUTFILE] [-P PROXY] [-p] [-t]
[-u USER_AGENT] [-v]
url
💀SCuWl💀, Simple custom wordlist generator.
positional arguments:
url url to scrape
options:
-h, --help show this help message and exit
-a, --alpha extract words with alphabet characters only,
default=False
-d DEPTH, --depth DEPTH
depth of search, default=0
-H HEADERS, --headers HEADERS
json headers for client
-m MIN_LENGTH, --min-length MIN_LENGTH
minimum length of words to keep, default=3
-M MAX_LENGTH, --max-length MAX_LENGTH
maximum length of words to keep, default=20
-o OUTFILE, --outfile OUTFILE
outfile for wordlist, default=stdout
-P PROXY, --proxy PROXY
proxy address for client
-p, --punctuation retains punctutation in words
-t, --tables extract words from tables only, default=False
-u USER_AGENT, --user-agent USER_AGENT
user-agent string for client
-v, --version show program's version number and exit
Examples
Generate wordlist and send to stdout
$ scuwl https://github.com/petebuffon/scuwl
1000
122
150
2022
20220930
...
Generate wordlist and save as wordlist.txt
$ scuwl -o wordlist.txt https://github.com/petebuffon/scuwl
$ wc -l wordlist.txt
309 wordlist.txt
Keep punctuation
$ scuwl -p -o wordlist.txt https://github.com/petebuffon/scuwl
$ head wordlist.txt
(2022-09-30)
(scrapes
(skull)
(~80mb)
--depth
Use a crawl depth of one (scrapes all links from input webpage)
$ scuwl -d 1 -o wordlist.txt https://github.com/petebuffon/scuwl
$ wc -l wordlist.txt
6675 wordlist.txt
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scuwl-1.2.tar.gz
.
File metadata
- Download URL: scuwl-1.2.tar.gz
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9a230403a1f15c6c414199a68b9f973b917412fcc8a25dc10cfca5d48e2afc6 |
|
MD5 | 011836d41524c31371f77ff694f1e9d0 |
|
BLAKE2b-256 | 901c5298123c6c23a669904ffd080f1f9fcffd76d544df3e08b33a37c2b0b6c4 |
File details
Details for the file scuwl-1.2-py3-none-any.whl
.
File metadata
- Download URL: scuwl-1.2-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e997f75dcebadb3673ff9013d795de4afbb84a46678492ec695cd64ab9e8ffe |
|
MD5 | 948efb36ad5f52be1cd09ca5d7a70f20 |
|
BLAKE2b-256 | b03760b0d462335560cf7fc813cdc0007a8daf863d338f61356bbe344d0e9863 |