Skip to main content

shenjian rest sdk

Project description

================================


概述
--------

神箭手 Python SDK



该版本的SDK依赖于第三方HTTP库 `requests <https://github.com/kennethreitz/requests>`。 请按照下述安装方法进行安装。

运行环境
-------------------

Python 2,3


安装
----------

通过pip安装:

.. code-block:: bash

$ pip install shenjian

直接通过安装包安装:

.. code-block:: bash

$ sudo python setup.py install


快速使用
---------------

.. code-block:: python


# -*- coding: utf-8 -*-
import shenjian

user_key = '你的user_key'
user_secret = '你的user_secret'

########shenjian.Service
service = shenjian.Service(user_key,user_secret)

# 获得应用列表
result = service.get_app_list(page=1, page_size=30)

# 获得爬虫列表
result = service.get_crawler_list(page=1, page_size=30)

# 创建爬虫
result = service.create_crawler(app_name="爬虫名字",code="爬虫代码",app_info='')


########shenjian.Crawler
crawler = shenjian.Crawler(user_key,user_secret,appID)

# 修改爬虫名称信息
result = crawler.edit(app_name="新的名称",app_info="新的info")

# 使用个人优质版代理IP,代理种类查看shenjian.proxy_type
result = crawler.config_proxy(shenjian.proxy_type.PROXY_TYPE_BETTER)

# 开启文件云托管
result = crawler.config_host(shenjian.host_type.HOST_TYPE_SHENJIANSHOU)

# 删除爬虫
result = crawler.delete()

# 设置爬虫自定义项(不同的爬虫自定义项不同,传递一个dict)
result = crawler.config_custom({"img":True})

#######启动爬虫#########
# 用2个节点启动爬虫
result = crawler.start(2)

#遇到爬取结果停止发现新链接,更新此数据dup_type='change',跳过继续往后爬dup_type='skip',默认为skip
result = crawler.start(dup_type='unspawn')

#覆盖原爬取结果里的数据,默认是保留原数据,插入一条新版本数据change_type='insert'
result = crawler.start(change_type='update')

#定时启动爬虫,该例子为每天爬取一次,更多定时设置与参数详见文档http://docs.shenjian.io/develop/platform/restful/crawler.html#启动爬虫
result = crawler.start(timer_type='daily',time_start='10:00',time_end='23:00')

#######启动爬虫#########

# 停止爬虫
result = crawler.stop()

# 暂停爬虫
result = crawler.pause()

# 继续爬虫(并设置运行的节点是3个)
result = crawler.resume(3)

# 获取爬虫状态
result = crawler.get_status()

# 获取爬虫速率
result = crawler.get_speed()

# 增加一个运行节点
result = crawler.add_node(1)

# 减少一个运行节点
result = crawler.reduce_node(1)

# 获取爬虫对应的数据源信息
result = crawler.get_source()

# 获取爬虫的Webhook设置
result = crawler.get_webhook()

# 删除爬虫的Webhook设置
result = crawler.delete_webhook()

# 修改爬虫的Webhook设置(设置为新增数据发送webhook,更新数据不发送,自定义数据不发送)
result = crawler.set_webhook(self,"http://www.baidu.com",data_new=True,data_updated=False,msg_custom=False)

# 获取爬虫的自动发布状态
result = crawler.get_publish_status()

# 启动自动发布
result = crawler.start_publish(publish_id)

# 停止自动发布
result = crawler.stop_publish()



Platform: UNKNOWN

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shenjian-1.4.tar.gz (5.2 kB view details)

Uploaded Source

File details

Details for the file shenjian-1.4.tar.gz.

File metadata

  • Download URL: shenjian-1.4.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.7.0

File hashes

Hashes for shenjian-1.4.tar.gz
Algorithm Hash digest
SHA256 7f12d5b261ecb5ca352d46c8d0cb50647045dd88d887ea77672d75d8b232d02e
MD5 a1715e990d6a62c70b787655d476c5f2
BLAKE2b-256 357df143d72e94242c7e88d3f8d490f45ebca6c0d62ae29e9a2f01545e1d5fb7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page