Skip to main content

魔改使用工具库

Project description

来自

https://github.com/shengchenyang/AyugeSpiderTools/blob/master/docs//docs/intro/install.md

增加个人使用的模板

安装

python 3.8+ 可以直接输入以下命令:

pip install gzspidertools

可选安装1,安装数据库相关的所有依赖:

pip install gzspidertools[database]

可选安装2,通过以下命令安装所有依赖:

pip install gzspidertools[all]

注:详细的安装介绍请查看安装指南

用法

# 查看库版本
gzcmd version

# 创建项目
gzcmd startproject <project_name>

# 进入项目根目录
cd <project_name>

# 替换(覆盖)为真实的配置 .conf 文件:
# 这里是为了演示方便,正常情况是直接在 VIT 中的 .conf 文件填上你需要的配置即可
cp /root/mytemp/.conf DemoSpider/VIT/.conf

# 生成爬虫脚本
gzcmd genspider <spider_name> <example.com>

# 生成 scrapy-redis 爬虫脚本   pip install scrapy_redis-0.7.3-py2.py3-none-any.whl
gzcmd genspider -t=sr <spider_name> <example.com>

# 运行脚本
scrapy crawl <spider_name>
# 注:也可以使用 gzcmd crawl <spider_name>

RedisDB

RedisDB支持哨兵模式集群模式与单节点的普通模式,封装了操作redis的常用的方法

连接

若环境变量中配置了数据库连接方式或者setting中已配置,则可不传参

普通模式

db = RedisDB(ip_ports="localhost:6379", db=0, user_pass=None)

使用地址连接

db = RedisDB.from_url("redis://[[username]:[password]]@[host]:[port]/[db]")

哨兵模式

db = RedisDB(ip_ports="172.25.21.4:26379,172.25.21.5:26379,172.25.21.6:26379", db=0, user_pass=None, service_name="my_master")

注意:多个地址用逗号分隔,需传递service_name

对应setting配置文件,配置方式为:

REDISDB_IP_PORTS = "172.25.21.4:26379,172.25.21.5:26379,172.25.21.6:26379"
REDISDB_USER_PASS = ""
REDISDB_DB = 0
REDISDB_SERVICE_NAME = "my_master"

集群模式

db = RedisDB(ip_ports="172.25.21.4:26379,172.25.21.5:26379,172.25.21.6:26379", db=0, user_pass=None)

注意:多个地址用逗号分隔,不用传递service_name

对应setting配置文件,配置方式为:

REDISDB_IP_PORTS = "172.25.21.4:26379,172.25.21.5:26379,172.25.21.6:26379"
REDISDB_USER_PASS = ""
REDISDB_DB = 0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gzspidertools-0.0.19.tar.gz (83.4 kB view details)

Uploaded Source

Built Distribution

gzspidertools-0.0.19-py3-none-any.whl (120.2 kB view details)

Uploaded Python 3

File details

Details for the file gzspidertools-0.0.19.tar.gz.

File metadata

  • Download URL: gzspidertools-0.0.19.tar.gz
  • Upload date:
  • Size: 83.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.5 Windows/10

File hashes

Hashes for gzspidertools-0.0.19.tar.gz
Algorithm Hash digest
SHA256 13727e24275537221beb4e3743ebbdf7998e3b3011a3cfb84e9d7c59760c44c0
MD5 e05dc4a8183591553a24da5d4ad518d0
BLAKE2b-256 abfecba8848df6998fa62bb1868175663cbb9fb5d708e62c5de9a6a3485858cd

See more details on using hashes here.

File details

Details for the file gzspidertools-0.0.19-py3-none-any.whl.

File metadata

  • Download URL: gzspidertools-0.0.19-py3-none-any.whl
  • Upload date:
  • Size: 120.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.5 Windows/10

File hashes

Hashes for gzspidertools-0.0.19-py3-none-any.whl
Algorithm Hash digest
SHA256 837f86cfc7b90cec1c84af4023abfdd54e73e6e4674aae7eececa3744ec90a15
MD5 85c1d0264a5b1c0a36663e447c4f99fc
BLAKE2b-256 d54f31192b13522ae586cc372534864d800b8f6575b86a222a460591d221e04a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page