For crawling web file explorers for content
Project description
The Crawler
Web crawling utility for downloading files from an exposed filesystem.
Installation
From PyPI
This assumes you have Python 3.10+ installed and pip3
is on
your path:
~$ pip3 install the-crawler
...
~$ the-crawler -h
usage: the-crawler [-h] [--recurse] [--output-directory OUTPUT_DIRECTORY] [--extensions EXTENSIONS [EXTENSIONS ...]] [--max-workers MAX_WORKERS] base_url
Crawls given url for content
positional arguments:
base_url
options:
-h, --help show this help message and exit
--recurse, -r
--output-directory OUTPUT_DIRECTORY, -o OUTPUT_DIRECTORY
--extensions EXTENSIONS [EXTENSIONS ...], -e EXTENSIONS [EXTENSIONS ...]
--max-workers MAX_WORKERS
From Source
This assumes you have git, Python 3.10+, and poetry installed already.
~$ git clone git@gitlab.com:woodforsheep/the-crawler.git
...
~$ cd the-crawler
the-crawler$ poetry install
...
the-crawler$ poetry run the-crawler -h
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
the_crawler-0.4.0.tar.gz
(3.3 kB
view hashes)
Built Distribution
Close
Hashes for the_crawler-0.4.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8ebbfaaaf8cad314f3fd77b095756c678fa758eb06821ef7226162af29b35538 |
|
MD5 | 769ba8684814ded49b9c9360210d47cd |
|
BLAKE2b-256 | cf093dabefa68c23ce64d14adddacb4a9b2a9753c41237965bf12c980e40f53b |