Skip to main content

百度百科简易爬虫

Project description

baike-spider

百度百科简易爬虫

⚠️ 该爬虫仅用于学习使用, 不得用于任何非法用途或侵犯他人合法权益 ⚠️

检察日报: 爬取数据需遵守


安装

pip install baike-spider

使用


模块

一次性解析全部: 该方法会一次性解析全部的数据并存储进对象属性中

from baikes import Baike

baike = Baike("网络爬虫")

print(baike.album)
print(baike.intro)
print(baike.paragraphs)
# ...

部分解析: 当你只需爬取部分数据时, 该方法能会降低部分性能损耗

from baikes import Baike

baike = Baike("网络爬虫", once=Flase)
intro = baike.get_intro()

print(intro)

有时会出现同名词, 参数 category 用于限定词条分类:

from baikes import Baike

baike = Baike("黄蜂", category="动物")

命令行

该爬虫可使用命令行进行调用

示例:

# 获取全部
python -m baikes -n "网络爬虫"

# 限定词条分类
python -m baikes -n "黄蜂" -c "动物"

# 一次性解析:
# 获取百科卡片
python -m baikes -n "网络爬虫" card

# 部分解析:
# 获取百科简介
python -m baikes -n "网络爬虫" -o False get_card

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

baikes-0.1.0.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

baikes-0.1.0-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file baikes-0.1.0.tar.gz.

File metadata

  • Download URL: baikes-0.1.0.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.10

File hashes

Hashes for baikes-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9fa4712fc8c980cb655eb79c799511fa7c8964a8972d6279de9fb836b276652b
MD5 c4ed76c9e994cbd82564736c750cc065
BLAKE2b-256 5f77e8b13059bd6fe9cd18eb4f46f58d2447d11bc046199839f726989cd3d8cf

See more details on using hashes here.

File details

Details for the file baikes-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: baikes-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.10

File hashes

Hashes for baikes-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4f09985fa523e863da82f4d9779489b9cdc8399d8f01e1ac42bf3f237dbeaec7
MD5 ad6e0b0b388716b7c5aa4fcaa5c50c1c
BLAKE2b-256 c2e805d552456ffe1c9d9c004306fdb5620e39f49bf229834e853fb31e57b2cd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page