Custom Word List generator Redefined

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Framework
- Scrapy
Intended Audience
- Information Technology
License
- Other/Proprietary License
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Internet :: WWW/HTTP :: Indexing/Search
- Security

Project description

CeWLeR - Custom Word List generator Redefined

CeWLeR crawls from a given URL and collects words create get great custom wordlists.

It's a great tool for security testers and bug bounty hunters. The lists can be used for password cracking, subdomain enumeration, directory and file brute forcing, API endpoint discovery, etc. It's good to have an additional target specific wordlist that is different than what everybody else use.

CeWLeR was sort of originally inspired by the really nice tool CeWL. I had some challenges with CeWL on a site I wanted a wordlist from, but without any Ruby experience I didn't know how to contribute and work around it. So instead I created a custom word list generator in Python to get the job done.

At a glance

Features

Generates custom wordlists by scraping words from web sites
A lot of options:
- Output so screen or file
- Can stay within subdomain, or visit sibling and child subdomains, or visit anything within the same top domain
- Can stay within a certain depth of a website
- Speed can be controlled
- Word length and casing can be configured
- ++
Using the excellent Scrapy framework for scraping and using the beautiful rich library for terminal output

Commands and options

Quick examples

Output to file

Will output to file unless a file is specified.
cewler --output wordlist.txt https://example.com

Control speed and depth

The rate is specified in requests per second. Please play nicely and don't don't break any rules.
cewler --output wordlist.txt --rate 5 --depth 2 https://example.com

Change User-Agent header

The default User-Agent is a common browser.
cewler --output wordlist.txt --user-agent "Cewler" https://example.com

Control casing, word length and characters

Unless specified the words will have mixed case and be of at least 5 in length.
cewler --output wordlist.txt --lowercase --min-word-length 2 --without-numbers https://example.com

Visit all domains - including parent, children and siblings

The default is to just visit exactly the same (sub)domain as specified.
cewler --output wordlist.txt -s all https://example.com

Visit same (sub)domain + any belonging child subdomains

cewler --output wordlist.txt -s children https://example.com

Ninja trick 🥷

If it just takes too long to crawl a site you can press ctrl + c once(!) and wait while the spider finishes the current requests and then whatever words have been found so far is stored to the output file.

All options

cewler -h
usage: cewler [-h] [-d DEPTH] [-l] [-m MIN_WORD_LENGTH] [-o OUTPUT] [-r RATE] [-s {all,children,exact}] [--stream] [-u USER_AGENT] [-v] [-w] url

Custom Word List generator Redefined

positional arguments:
  url                   URL to start crawling from

options:
  -h, --help            show this help message and exit
  -d DEPTH, --depth DEPTH
                        directory path depth to crawl, 0 for unlimited (default: 2)
  -l, --lowercase       lowercase all parsed words
  -m MIN_WORD_LENGTH, --min-word-length MIN_WORD_LENGTH
  -o OUTPUT, --output OUTPUT
                        file were to stream and store wordlist instead of screen (default: screen)
  -r RATE, --rate RATE  requests per second (default: 20)
  -s {all,children,exact}, --subdomain_strategy {all,children,exact}
                        allow crawling [all] domains, including children and siblings, only [exact] the same (sub)domain (default), or same domain and any belonging [children]
  --stream              writes to file after each request (may produce duplicates because of threading) (default: false)
  -u USER_AGENT, --user-agent USER_AGENT
                        User-Agent header to send (default: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36)
  -v, --verbose         A bit more detailed output
  -w, --without-numbers
                        ignore words are numbers or contain numbers

Digging into the code

If you want to do some tweaking you yourself you can probably find what you want in blob/main/src/constants.py and blob/main/src/spider.py

Installation

Alternative 1 - installing from PyPI

python3 -m pip install cewler

Alternative 2 - installing from GitHub

1. Clone repository

git clone https://github.com/roys/cewler.git --depth 1

2. Install dependencies

pip3 install -r requirements.txt

3. Shortcut on Un*x based system (optional)

cd src/cewler
chmod +x cewler.py
ln -s $(pwd)/cewler.py /usr/local/bin/cewler
cewler -h

Pronunciation

CeWLeR is pronounced "cooler".

License

Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Framework
- Scrapy
Intended Audience
- Information Technology
License
- Other/Proprietary License
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Internet :: WWW/HTTP :: Indexing/Search
- Security

Release history Release notifications | RSS feed

1.2.0

Mar 19, 2024

1.1.2.post3

Oct 25, 2023

1.1.1

Mar 28, 2023

1.1.0

Mar 28, 2023

1.0.9

Feb 13, 2023

1.0.8

Feb 13, 2023

1.0.7

Feb 12, 2023

This version

1.0.6

Feb 9, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cewler-1.0.6.tar.gz (17.5 kB view hashes)

Uploaded Feb 9, 2023 Source

Built Distribution

cewler-1.0.6-py3-none-any.whl (16.1 kB view hashes)

Uploaded Feb 9, 2023 Python 3

Hashes for cewler-1.0.6.tar.gz

Hashes for cewler-1.0.6.tar.gz
Algorithm	Hash digest
SHA256	`05ec0810b5b1145f1e3c885cd8e474834ca98db28cadb321728fb689de9802ee`
MD5	`61ac5d949a8b08e3ea1d6cdd734cb15f`
BLAKE2b-256	`e2b817424d8f8b73b93077559f6f034da18d34cfa8c424eb3d1875b2b9fa930e`

Hashes for cewler-1.0.6-py3-none-any.whl

Hashes for cewler-1.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`89f50554d112360f8d55d4d502669ad0ae28b001ac2e79faf4a6707becfa845c`
MD5	`25b43e7cfb09ce4bae81835902659e58`
BLAKE2b-256	`54397fda6f7e35c3eb4c87ceb041adb03b7c60054107c60bedf6d2b7f19c1ff3`

cewler 1.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

CeWLeR - Custom Word List generator Redefined

At a glance

Features

Commands and options

Quick examples

Output to file

Control speed and depth

Change User-Agent header

Control casing, word length and characters

Visit all domains - including parent, children and siblings

Visit same (sub)domain + any belonging child subdomains

Ninja trick 🥷

All options

Digging into the code

Installation

Alternative 1 - installing from PyPI

Alternative 2 - installing from GitHub

1. Clone repository

2. Install dependencies

3. Shortcut on Un*x based system (optional)

Pronunciation

License

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution