A fast web crawler to satisfy all your needs
A web crawler written in Python to crawl a given website.
- Ablility to specify the number of threads to use to crawl the given website
- Ability to use proxies to bypass IP restrictions
- Clear summary of all the urls that were crawled. View the crawled.txt file to see the complete list of all the links crawled
- Ability to specify delay between each HTTP Request
- Stop and resume crawler whenever you need
- Gather all the urls with their titles to a csv, incase if you are planning to create a search engine
- Search for specific text throughout the website
- Clear statistics about how many links ended up as Files,Timeout Errors,Connecrion Errors
- Crawl until you need. You can specify upto what level the crawler should crawl.
- Random browser user agents will be used while crawling.
- Gather AWS Buckets,Emails,Phone Numbers etc
- Download all images
This tool uses a number of open source projects to work properly:
- BeautifulSoup - Parser to parse the HTML response of each request made.
- Requests - To make GET requests to the URLs.
If you like to see the list of supported features, simply run
Specifying only to crawl for 3 levels
Search for specific text throughout the website
Gather all the links along with their titles to a CSV file. A CSV file with the links and their titles will be created after the crawl completes
Use proxies to crawl the site.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size pywebcrawler-0.0.1-py3-none-any.whl (17.4 kB)||File type Wheel||Python version py3||Upload date||Hashes View|
|Filename, size pywebcrawler-0.0.1.tar.gz (8.2 kB)||File type Source||Python version None||Upload date||Hashes View|
Hashes for pywebcrawler-0.0.1-py3-none-any.whl