Scrapy-based Web Crawler with an UI
Project description
Arachnado
Arachnado is a tool to crawl a specific website. It provides a Tornado-based HTTP API and a web UI for a Scrapy-based crawler.
License is MIT.
Install
Arachnado requires Python 2.7. To install Arachnado use pip:
pip install arachnado
To install Arachnado with MongoDB support use this command:
pip install arachnado[mongo]
Run
To start Arachnado execute arachnado command:
arachnado
and then visit http://0.0.0.0:8888 (or whatever URL is configured).
To see available command-line options use
arachnado –help
Arachnado can be configured using a config file. Put it to one of the common locations (‘/etc/arachnado.conf’, ‘~/.config/arachnado.conf’ or ‘~/.arachnado.conf’) or pass the file name as an argument when starting the server:
arachnado --config ./my-config.conf
For available options check https://github.com/TeamHG-Memex/arachnado/blob/master/arachnado/settings/defaults.conf.
Development
Source code: https://github.com/TeamHG-Memex/arachnado
Issue tracker: https://github.com/TeamHG-Memex/arachnado/issues
To build Arachnado static assets node.js + npm are required. Install all JavaScript requirements using npm - run the following command from the repo root:
npm install
then rebuild static files (we use Webpack):
npm run build
or auto-build static files on each change during development:
npm run watch
Changes
0.2 (2015-08-07)
Initial release.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.