Skip to main content

SCuWl, Simple custom wordlist generator.

Project description

💀 Scuwl 💀

Simple custom wordlist generator

Scuwl (skull) is a Python CLI program that quickly and easily generates a wordlist from a webpage. The idea for Scuwl was inspired by the program Cewl. Scuwl defaults to a crawling depth of zero and most webpages return a wordlist in less than a second. Using a crawling depth of one generally takes a few minutes.

Scuwl is fast because it recursively scrapes websites asynchronously. Scuwl minimizes its memory footprint by processing HTML as it goes and updating the wordlist in memory as a set. By default Scuwl keeps unique words, three characters long and over, and removes all punctuation.

Note: Using a crawling depth of over one remains untested.

Features

  • Fast recursive asynchronous web requests using aiohttp
  • CLI options gives you control over the generated wordlist
  • Simple Python codebase (< 175 lines)
  • Low memory usage (~100MB)

Installation

  $ python -m pip install scuwl

Usage

$ scuwl -h
usage: scuwl.py [-h] [-a] [-d DEPTH] [-H HEADERS] [-m MIN_LENGTH]
                [-M MAX_LENGTH] [-o OUTFILE] [-P PROXY] [-p] [-t]
                [-u USER_AGENT] [-v]
                url

💀SCuWl💀, Simple custom wordlist generator.

positional arguments:
  url                   url to scrape

options:
  -h, --help            show this help message and exit
  -a, --alpha           extract words with alphabet characters only,
                        default=False
  -d DEPTH, --depth DEPTH
                        depth of search, default=0
  -H HEADERS, --headers HEADERS
                        json headers for client
  -m MIN_LENGTH, --min-length MIN_LENGTH
                        minimum length of words to keep, default=3
  -M MAX_LENGTH, --max-length MAX_LENGTH
                        maximum length of words to keep, default=20
  -o OUTFILE, --outfile OUTFILE
                        outfile for wordlist, default=stdout
  -P PROXY, --proxy PROXY
                        proxy address for client
  -p, --punctuation     retains punctutation in words
  -t, --tables          extract words from tables only, default=False
  -u USER_AGENT, --user-agent USER_AGENT
                        user-agent string for client
  -v, --version         show program's version number and exit

Examples

Generate wordlist and send to stdout

$ scuwl https://github.com/petebuffon/scuwl
1000
122
150
2022
20220930
...
Generate wordlist and save as wordlist.txt

$ scuwl -o wordlist.txt https://github.com/petebuffon/scuwl
$ wc -l wordlist.txt
309 wordlist.txt
Keep punctuation

$ scuwl -p -o wordlist.txt https://github.com/petebuffon/scuwl
$ head wordlist.txt
(2022-09-30)
(scrapes
(skull)
(~80mb)
--depth
Use a crawl depth of one (scrapes all links from input webpage)

$ scuwl -d 1 -o wordlist.txt https://github.com/petebuffon/scuwl
$ wc -l wordlist.txt
6675 wordlist.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scuwl-1.2.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

scuwl-1.2-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file scuwl-1.2.tar.gz.

File metadata

  • Download URL: scuwl-1.2.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.7

File hashes

Hashes for scuwl-1.2.tar.gz
Algorithm Hash digest
SHA256 a9a230403a1f15c6c414199a68b9f973b917412fcc8a25dc10cfca5d48e2afc6
MD5 011836d41524c31371f77ff694f1e9d0
BLAKE2b-256 901c5298123c6c23a669904ffd080f1f9fcffd76d544df3e08b33a37c2b0b6c4

See more details on using hashes here.

File details

Details for the file scuwl-1.2-py3-none-any.whl.

File metadata

  • Download URL: scuwl-1.2-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.7

File hashes

Hashes for scuwl-1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5e997f75dcebadb3673ff9013d795de4afbb84a46678492ec695cd64ab9e8ffe
MD5 948efb36ad5f52be1cd09ca5d7a70f20
BLAKE2b-256 b03760b0d462335560cf7fc813cdc0007a8daf863d338f61356bbe344d0e9863

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page