Skip to main content

SCuWl, Simple custom wordlist generator.

Project description

💀 Scuwl 💀

Simple custom wordlist generator

Scuwl (skull) is a Python CLI program that quickly and easily generates a wordlist from a webpage. The idea for Scuwl was inspired by the program Cewl. Scuwl defaults to a crawling depth of zero and most webpages return a wordlist in less than a second. Using a crawling depth of one generally takes a few minutes.

Scuwl is fast because it recursively scrapes websites asynchronously. Scuwl minimizes its memory footprint by processing HTML as it goes and updating the wordlist in memory as a set. By default Scuwl keeps unique words, three characters long and over, and removes all punctuation.

Note: Using a crawling depth of over one remains untested.

Features

  • Fast recursive asynchronous web requests using aiohttp
  • CLI options gives you control over the generated wordlist
  • Simple Python codebase (< 175 lines)
  • Low memory usage (~100MB)

Installation

  $ python -m pip install scuwl

Usage

$ scuwl -h
usage: scuwl.py [-h] [-a] [-d DEPTH] [-H HEADERS] [-m MIN_LENGTH]
                [-M MAX_LENGTH] [-o OUTFILE] [-P PROXY] [-p] [-t]
                [-u USER_AGENT] [-v]
                url

💀SCuWl💀, Simple custom wordlist generator.

positional arguments:
  url                   url to scrape

options:
  -h, --help            show this help message and exit
  -a, --alpha           extract words with alphabet characters only,
                        default=False
  -d DEPTH, --depth DEPTH
                        depth of search, default=0
  -H HEADERS, --headers HEADERS
                        json headers for client
  -m MIN_LENGTH, --min-length MIN_LENGTH
                        minimum length of words to keep, default=3
  -M MAX_LENGTH, --max-length MAX_LENGTH
                        maximum length of words to keep, default=20
  -o OUTFILE, --outfile OUTFILE
                        outfile for wordlist, default=stdout
  -P PROXY, --proxy PROXY
                        proxy address for client
  -p, --punctuation     retains punctutation in words
  -t, --tables          extract words from tables only, default=False
  -u USER_AGENT, --user-agent USER_AGENT
                        user-agent string for client
  -v, --version         show program's version number and exit

Examples

Generate wordlist and send to stdout

$ scuwl https://github.com/petebuffon/scuwl
1000
122
150
2022
20220930
...
Generate wordlist and save as wordlist.txt

$ scuwl -o wordlist.txt https://github.com/petebuffon/scuwl
$ wc -l wordlist.txt
309 wordlist.txt
Keep punctuation

$ scuwl -p -o wordlist.txt https://github.com/petebuffon/scuwl
$ head wordlist.txt
(2022-09-30)
(scrapes
(skull)
(~80mb)
--depth
Use a crawl depth of one (scrapes all links from input webpage)

$ scuwl -d 1 -o wordlist.txt https://github.com/petebuffon/scuwl
$ wc -l wordlist.txt
6675 wordlist.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scuwl-1.2.tar.gz (5.6 kB view hashes)

Uploaded Source

Built Distribution

scuwl-1.2-py3-none-any.whl (6.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page