Skip to main content

Web app for Scrapyd cluster management, with support for Scrapy log analysis & visualization.

Project description

English | 简体中文

ScrapydWeb: Web app for Scrapyd cluster management, with support for Scrapy log analysis & visualization.

PyPI - scrapydweb Version PyPI - Python Version CircleCI codecov Coverage Status Downloads - total GitHub license Twitter

servers

Scrapyd ScrapydWeb LogParser

Recommended Reading

How to efficiently manage your distributed web scraping projects

How to set up Scrapyd cluster on Heroku

Demo

scrapydweb.herokuapp.com

Features

View contents
  • Scrapyd Cluster Management

    • All Scrapyd JSON API Supported
    • Group, filter and select any number of nodes
    • Execute command on multinodes with just a few clicks
  • Scrapy Log Analysis

    • Stats collection
    • Progress visualization
    • Logs categorization
  • Enhancements

    • Auto packaging
    • Integrated with LogParser
    • Timer tasks
    • :e-mail: Monitor & Alert
    • Mobile UI
    • Basic auth for web UI

Getting Started

View contents

Prerequisites

Make sure that Scrapyd has been installed and started on all of your hosts.

Note that for remote access, you have to manually set 'bind_address = 0.0.0.0' in the configuration file of Scrapyd and restart Scrapyd to make it visible externally.

Install

  • Use pip:
pip install scrapydweb

Note that you may need to execute python -m pip install --upgrade pip first in order to get the latest version of scrapydweb, or download the tar.gz file from https://pypi.org/project/scrapydweb/#files and get it installed via pip install scrapydweb-x.x.x.tar.gz

  • Use git:
pip install --upgrade git+https://github.com/my8100/scrapydweb.git

Or:

git clone https://github.com/my8100/scrapydweb.git
cd scrapydweb
python setup.py install

Start

  1. Start ScrapydWeb via command scrapydweb. (a config file would be generated for customizing settings at the first startup.)
  2. Visit http://127.0.0.1:5000 (It's recommended to use Google Chrome for a better experience.)

Browser Support

The latest version of Google Chrome, Firefox, and Safari.

Running the tests

View contents
$ git clone https://github.com/my8100/scrapydweb.git
$ cd scrapydweb

# To create isolated Python environments
$ pip install virtualenv
$ virtualenv venv/scrapydweb
# Or specify your Python interpreter: $ virtualenv -p /usr/local/bin/python3.7 venv/scrapydweb
$ source venv/scrapydweb/bin/activate

# Install dependent libraries
(scrapydweb) $ python setup.py install
(scrapydweb) $ pip install pytest
(scrapydweb) $ pip install coverage

# Make sure Scrapyd has been installed and started, then update the custom_settings item in tests/conftest.py
(scrapydweb) $ vi tests/conftest.py
(scrapydweb) $ curl http://127.0.0.1:6800

# '-x': stop on first failure
(scrapydweb) $ coverage run --source=scrapydweb -m pytest tests/test_a_factory.py -s -vv -x
(scrapydweb) $ coverage run --source=scrapydweb -m pytest tests -s -vv --disable-warnings
(scrapydweb) $ coverage report
# To create an HTML report, check out htmlcov/index.html
(scrapydweb) $ coverage html

Built With

View contents

Changelog

Detailed changes for each release are documented in the HISTORY.md.

Author


my8100

Contributors


Kaisla

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapydwebx-1.5.5.tar.gz (702.1 kB view details)

Uploaded Source

Built Distribution

scrapydwebx-1.5.5-py3-none-any.whl (721.7 kB view details)

Uploaded Python 3

File details

Details for the file scrapydwebx-1.5.5.tar.gz.

File metadata

  • Download URL: scrapydwebx-1.5.5.tar.gz
  • Upload date:
  • Size: 702.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.10

File hashes

Hashes for scrapydwebx-1.5.5.tar.gz
Algorithm Hash digest
SHA256 dda66d53ed01de444d691460664339d5bbeb64f7d778e32234b5c4ebf693d424
MD5 d2e9b9b8555e9c7c419d751f6eccda93
BLAKE2b-256 506fb3868f538f9d632d0a3e63c6548f305c66a0744a6b7dd15d02f46333d3b4

See more details on using hashes here.

File details

Details for the file scrapydwebx-1.5.5-py3-none-any.whl.

File metadata

  • Download URL: scrapydwebx-1.5.5-py3-none-any.whl
  • Upload date:
  • Size: 721.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.10

File hashes

Hashes for scrapydwebx-1.5.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b3e4d3525bd2162faa57ed82c62d23362759b48b48f8e946ebec3534933ef802
MD5 a0aea7f1cddd091a0fb7516f2166fe0f
BLAKE2b-256 3368354b3a1643a459380705578cb2a3727bb01294c7ada9a288836aea4dd99b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page