Skip to main content

For crawling web file explorers for content

Project description

The Crawler

Web crawling utility for downloading files from an exposed filesystem.

Installation

From PyPI

This assumes you have Python 3.10+ installed and pip3 is on your path:

~$ pip3 install the-crawler
...
~$ the-crawler -h
usage: the-crawler [-h] [--recurse] [--output-directory OUTPUT_DIRECTORY] [--extensions EXTENSIONS [EXTENSIONS ...]] [--max-workers MAX_WORKERS] base_url

Crawls given url for content

positional arguments:
  base_url

options:
  -h, --help            show this help message and exit
  --recurse, -r
  --output-directory OUTPUT_DIRECTORY, -o OUTPUT_DIRECTORY
  --extensions EXTENSIONS [EXTENSIONS ...], -e EXTENSIONS [EXTENSIONS ...]
  --max-workers MAX_WORKERS

From Source

This assumes you have git, Python 3.10+, and poetry installed already.

~$ git clone git@gitlab.com:woodforsheep/the-crawler.git
...
~$ cd the-crawler
the-crawler$ poetry install
...
the-crawler$ poetry run the-crawler -h

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

the_crawler-0.4.0.tar.gz (3.3 kB view hashes)

Uploaded Source

Built Distribution

the_crawler-0.4.0-py3-none-any.whl (3.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page