Skip to main content

简易、好用的爬虫工具,减少重复代码与文件冗余

Project description

easy spider tool

在实际工作中,沉淀的一些简易、好用的爬虫工具,减少重复代码与文件冗余,希望一样能为使用者带来益处。如果您也想贡献好的代码片段,请将代码以及描述,通过邮箱( xinkonghan@gmail.com )发送给我。代码格式是遵循自我主观,如存在不足敬请指出!

安装

pip install easy_spider_tool

主要功能

  • 时间相关
    • before_day 昨天日期(可用于时间递减)
    • after_day 明天日期(可用于时间递增)
    • between_day 两个日期之间
    • current_date 当前时间
    • timestamp 当前时间戳(支持精确到毫秒)
    • date_parse 任意格式时间解析(支持时区转换,指定保留日期/时间(可设置默认值)部分)
  • json相关
    • format_json 漂亮美观的格式化输出
    • jsonpath 任意多个json路径解析(支持设置默认值,选取首个匹配值)
  • hash摘要相关
    • md5 字符经md5编码
  • 正则匹配相关
    • regx_match 单个条件匹配(支持设置默认值,选取首个匹配值)
    • for_to_regx_match 多个不相关条件匹配(支持设置默认值,选取首个匹配值,最新版已废弃,整合到regx_match)
  • 数据清洗/转换相关
    • cookie_to_dic cookie转换为字典(Dict)格式
    • clear_value 清除列表(List)或字典(Dict)中的指定值(递归清除所有嵌套字典和列表中的指定值)
  • 合法性验证相关
    • verify_ip_address IP地址合法性验证
    • verify_domain_name 域名合法性验证
    • verify_port 端口合法性验证
    • verify_url URL合法性验证
  • 通知相关
    • 暂无

简单使用

from easy_spider_tool import format_json, jsonpath

data = {
    "code": 200,
    "data": [
        {
            "id": 1,
            "username": "admin",
            "level": "boss"
        },
        {
            "id": 2,
            "username": "user",
            "level": "staff"
        }
    ]
}

boss_name = jsonpath(data, '$.data[?(@.level=="boss")].username', first=True)
all_user_info = jsonpath(data, '$.data[*].username')

print(boss_name)
print(format_json(all_user_info))

链接

Github:https://github.com/hanxinkong/easy-spider-tool

在线文档:https://easy-spider-tool.xink.top/

注明

该工具借鉴作者【xingcweb】,根据主观新增部分功能

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easy_spider_tool-1.0.12.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

easy_spider_tool-1.0.12-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file easy_spider_tool-1.0.12.tar.gz.

File metadata

  • Download URL: easy_spider_tool-1.0.12.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.6

File hashes

Hashes for easy_spider_tool-1.0.12.tar.gz
Algorithm Hash digest
SHA256 3a2660f1d9ecc1591f047f12629eda2a9e9546a8c0b9b4a2ae387ba25de8c48b
MD5 597734fce0f9081c7112f19a90f28d5f
BLAKE2b-256 465dd7b40bb2f3ab3542d0b0f1c12baac71408d44d72cd7af18e67c56ba20af2

See more details on using hashes here.

File details

Details for the file easy_spider_tool-1.0.12-py3-none-any.whl.

File metadata

File hashes

Hashes for easy_spider_tool-1.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 24bbed349873a0f972a1c1d71c85832e4b6172a6d3d1a3256c7cf953f296d91f
MD5 b699e5a2e3aa60382978b79496a5fa7d
BLAKE2b-256 7be5b97ca1bfd046f13413e505458ae34180355ce245be5134367127801771a1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page