Skip to main content

NO CODE!!! Base on Scrapy, crawl websites with simple configuration.

Project description

dig-spider

Overview

Dig-spider is a BSD-licensed fast high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Dig-spider is a code-free crawler. It is based on scrapy, and support the same command line with scrapy.

Requirements

  • Python 3.9+
  • Scrapy 2.11+
  • Works on Linux, Windows, macOS, BSD

Install

The quick way:

pip install dig-spider

Usage

dig-spider gentemplate dst

generate template to target directory (dst) modify config_template.yaml and code_template.py

dig-spider crawl website -a config=dst/config_template.yaml

use dig-spider to replace scrapy, website is the default spider, dst/config_template.yaml is the webpage parse rule.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dig_spider-0.1.0.tar.gz (16.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dig_spider-0.1.0-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file dig_spider-0.1.0.tar.gz.

File metadata

  • Download URL: dig_spider-0.1.0.tar.gz
  • Upload date:
  • Size: 16.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for dig_spider-0.1.0.tar.gz
Algorithm Hash digest
SHA256 43849a1165ed39a8d9947c9bc5c19c69874665bb7231e185d24a976f7a458238
MD5 576afb6d37774c92f097b29e2ff9ac85
BLAKE2b-256 47691f378b44d6a9a5be3c0822447f161e6ab59b14949928e19900470adb026e

See more details on using hashes here.

File details

Details for the file dig_spider-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dig_spider-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for dig_spider-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 491d90b226b836bf38268c4f13edeea4e1d13f8e4c8b2b5ff692dee435ecbd51
MD5 0424b96664785bc545305c92ba416d05
BLAKE2b-256 448703097b8e28dfc93bbb9d473cb41f2db22561e1878ba061be7a38f5129845

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page