Skip to main content

百度百科简易爬虫

Project description

baikeS

baidu-baike-spider

百度百科简易爬虫

⚠️ 该爬虫仅用于学习使用, 不得用于任何非法用途或侵犯他人合法权益 ⚠️

检察日报: 爬取数据需遵守


安装

pip install baikes

使用


模块

获取你想要的数据

from baikes import Baike

baike = Baike("网络爬虫")

print(baike.album)
print(baike.intro)
print(baike.paragraphs)
# ...

Baike 对象包含以下属性:

属性名 类型 描述
name str 条目名称
title str 条目标题
album str 概述图 URL
intro str 简介
card OrderedDict 知识卡片
paragraphs OrderedDict 描述段落

有时可能会出现同名词, 参数 category 用于限定词条分类:

from baikes import Baike

baike = Baike("黄蜂", category="动物")

命令行

该爬虫可使用命令行进行调用

示例:

# 获取全部
python -m baikes -n "网络爬虫"

# 限定词条分类
python -m baikes -n "黄蜂" -c "动物"

# 获取百科卡片
python -m baikes -n "网络爬虫" card

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

baikes-0.1.5.tar.gz (6.2 kB view details)

Uploaded Source

File details

Details for the file baikes-0.1.5.tar.gz.

File metadata

  • Download URL: baikes-0.1.5.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.10

File hashes

Hashes for baikes-0.1.5.tar.gz
Algorithm Hash digest
SHA256 d8cb749c35f76f57df8420e9f0639b8c6153216e667e10d68d2fed6c884e0c41
MD5 7bd2f1ceb765d20c80bf0f1343406f9b
BLAKE2b-256 3e5436c253d047ad927a6102a5f22e84d0e4f95af66161047820b22f810b4431

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page